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Program description^ 



Success for All (SFA)® is a comprehensive school reform model 
that includes a reading, writing, and oral language development 
program for students in pre-kindergarten through grade eight. Its 
underlying premise is that all children can and should be reading 
at grade level by the end of third grade and then remain at grade 
level thereafter. Classroom reading instruction is delivered in daily 
90-mlnute blocks to students grouped by reading ability. Immedi- 
ate intervention with tutors who are certified teachers is given each 
day to those students who are having difficulty reading at the same 
level as their classmates. A full-time SFA® facilitator employed by 
the school supports classroom instruction by training teachers, 
overseeing student assessments, and assisting with decisions 
about group placement and tutoring. Family Support Teams work 
on parent involvement, absenteeism, and student behavior. 



This intervention report focuses on the reading instructional 
component of SFA®, which is often implemented in the context 
of the highly structured SFA® whole school reform program. 
Although the whole school reform program has key components 
that are implemented in each school, school sites may vary con- 
siderably in the number of personnel used to implement SFA®, 
particularly tutors and family support staff. The reading curricula 
are essentially the same at all schools, with each school receiv- 
ing the same training, coaching support, and materials. Ratings 
presented in this report are not disaggregated by the variations 
in implementation of whole school reforms. Reading outcomes 
from all studies included in this report are examined together 
and formed the basis for a single effectiveness rating for each 
outcome domain. 



RSSSarch One study met the WWC evidence standards and six studies 

met WWC evidence standards with reservations. Altogether, the 
studies included nearly 6,000 students attending more than 90 
elementary schools across the United States. The seven stud- 
ies focused on students in grades K-3 who received the SFA® 



intervention for varying amounts of time.^ The WWC considers 
the extent of evidence for SFA® to be moderate to large for 
alphabetics, comprehension, and general reading aohievement. 
No studies that met WWC evidence standards with or without 
reservations addressed fluency. 



1. The descriptive information for this program was obtained from the pubiiciy avaiiabie program web site (www.successforaii.net . downioaded February 
2007). The WWC requests deveiopers to review the program description sections for accuracy from their perspective. Further verification of the accu- 
racy of the descriptive information for this program is beyond the scope of this review. 

2. The evidence presented in this report is based on avaiiabie research. Findings and conciusions may change as new research becomes avaiiabie. 
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Success for All® was found to have potentially positive effects on aiphabetics and general reading achievement and mixed effects on 



comprehension. 












Aiphabetics 


Fluency 


Comprehension 


General reading 
achievement 


Rating of effectiveness 


Potentially positive 
effects 


na 


Mixed effects 


Potentially positive 
effects 


Improvement index^ 


Average: +13 percentile 
points 

Range: 0 to +32 percen- 
tile points 


na 


Average: +8 percentile 
points 

Range: 0 to +17 percen- 
tile points 


Average: +10 percentile 
points 

Range: +2 to +22 percen- 
tile points 


na = not applicable 



Developer and contact 

Deveioped by Robert Slavin and Nancy Madden in conjunction 
with the Johns Hopkins University, Success for All® is distrib- 
uted by Success for Ali Foundation, Inc., 200 W. Towsontown 
Boulevard, Baltimore, Maryland 21204-5200. Email: sfainfo® 
successforall.org Web: www.successforall.net Telephone: (800) 
548-4998 ext. 2372. 

Scope of Use 

SFA® is used by schools in 48 states, Guam, and the Virgin 
Islands. According to the Success for All Foundation, more than 
1,300 schools in over 500 districts have used the SFA® whole 
school reform program. Israel, Canada, Mexico, and Australia 
have implemented adapted versions of SFA®. 

Teaching 

During the regular daily 90-minute reading period, students 
are grouped into reading classes of 15-20 students who are all 
performing at the same reading level (regardless of age- or grade- 
level). Regrouping allows teachers to teach the whole class without 
having to break the class into multiple smaller reading groups. 

Reading teachers at every grade level begin the period by 
reading children’s literature to students. Teachers discuss the 



story with students to enhance the students’ understanding of 
the story and the story structure and to increase their listen- 
ing and speaking vocabulary. In kindergarten and first grade, 
teachers emphasize the development of language skills and 
use phonetically regular storybooks and instruction to focus on 
phonemic awareness, auditory discrimination, and sound blend- 
ing. In the second through fifth grade, teachers use school- or 
district-provided reading materials, either basal or trade books, 
in a structured set of interactive activities in which students 
read, discuss, and write about the books. At this stage, teachers 
emphasize cooperative learning activities built around partner 
reading. Students work on identifying characters, settings, and 
problem solutions in narratives. Students receive direct instruc- 
tion in reading comprehension skills. 

Teachers in their first year teaching SFA® receive a three-day 
summer training and 12 additional on-site support days during 
the school year. Additional in-service presentations covering 
topics such as classroom management, instructional pace, and 
cooperative learning are made by school facilitators and other 
program staff throughout the year. Facilitators organize informa- 
tion sessions to allow teachers to share problems and solutions, 
suggest changes, and discuss individual children. Twice a year, 
trainers provided by the developer visit and observe teachers. 



3. These numbers show the average and range of improvement indices for all findings across the studies. 
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After the first year, training is reinforced by regular in-services, 
an annual SFA® conference, and on-site implementation support 
visits for school leaders and teachers. The staff development 
model used in \A/hole school SFA® reform emphasizes relatively 
brief initial training with extensive classroom follow-up, coaching, 
and group discussion. 

Principals and facilitators receive five days of initial training in 
leadership, data collection and progress monitoring, classroom 



instructional practices, school climate, and intervention using 
SFA® strategies. 

Cost 

The cost of the SFA® whole school reform program is approxi- 
mately $80,000 in the first year, about $50,000 in the second 
year, and $35,000 in the third. Teacher training and ongoing 
support is required and is included in the cost of the program. 



Seventy-four studies reviewed by the WWC investigated the 
effects of SFA®. One study (Borman, Slavin, Cheung, Chamber- 
lain, Madden, & Chambers, 2006) was a randomized controlled 
trial that met WWC evidence standards. Six other studies 
(Dianda & Flaherty, 1995; Madden, Slavin, Karweit, Dolan, & 
Wasik, 1993; Ross, Alberg, & McNelis, 1997; Ross & Casey, 

1998; Ross, McNelis, Lewis, & Loomis, 1998; and Smith, Ross, 
Faulks, Casey, Shapiro, & Johnson, 1993) were quasi-experimen- 
tal designs that met WWC evidence standards with reservations. 
The remaining studies did not meet WWC evidence screens. 

Some studies measured the impact of SFA® after a cohort 
of students was exposed to SFA® for one, two, and three years. 
To determine ratings, the WWC used results from the final year 
reported in a study for the overall domain rating, prioritizing the 
outcomes that reflected students’ exposure to the intervention 
for the longest period of time available."* The studies in this report 
reflect results after: (1) three years of exposure to SFA® (2 stud- 
ies); (2) two years of exposure to SFA® (2 studies); and (3) one 
year of exposure to SFA® (3 studies). 

Met evidence standards 

• Borman, Slavin, Cheung, Chamberlain, Madden, & Chambers 
(2006) was a cluster randomized controlled trial that examined 
the effects of SFA® on students in grades K-2 across 14 



states. The study randomly assigned 41 schools to SFA® and 
the comparison conditions and presented findings on stu- 
dents who had completed one, two, or three years of the pro- 
gram compared with students who took part in their schools’ 
typical reading program. The WWC based effectiveness 
ratings on findings from the third-year longitudinal sample 
of 1,425 students who began the study in kindergarten in 18 
intervention and 17 comparison schools. 

Met evidence standards with reservations 

• Dianda and Flaherty (1995) studied the impact of SFA® on 
three different cohorts of students who started kindergarten 
in 1992, 1993, or 1994. Students were from six elementary 
schools in California. Students were grouped into four 
language categories; the WWC focuses only on the English- 
speaking group of 539 students for this review.® SFA® 
students were compared with students who did not use the 
SFA® program. The WWC based effectiveness ratings on find- 
ings for the three cohorts who were exposed to SFA® for two, 
three, or four years. 

• Madden, Slavin, Karweit, Dolan, & Wasik (1993) evaluated the 
effects of SFA® in Baltimore City elementary schools. The 
authors evaluated three different levels of implementation 

of the SFA® program: full implementation, curriculum only. 



4. SFA® is designed to teach chiidren to read at grade ievei by third grade and the third year of program implementation is regarded as the full “dose” of 
Success for All (Borman et al., 2006). 

5. The WWC Beginning Reading topic focuses only on students learning to read in English (see Beginning Reading Protocol . 
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and focus on dropout prevention.® The WWC focused on the 
full implementation portion of the study. Two schools that 
implemented the SFA® were compared with two matched 
comparison schools that received a traditional reading basal 
program. The WWC based effectiveness ratings on the find- 
ings for students at the end of three years of implementation 
for alphabetics and general reading achievement domains. 
Ross, Alberg, and McNelis (1997) included first-grade stu- 
dents from 19 elementary schools implementing alternative 
school-wide programs in the Northwest. The 19 schools 
were formed into four clusters of similar schools. For this 
review, the WWC reported results from students in three SFA® 
schools who were compared with the students from three 
schools that implemented the Accelerated Schools program. 
This subsample consisted of “cluster 2A” schools, which 
were neither the most disadvantaged, nor the most affluent, 
schools in the sample. This WWC review focused on the find- 
ings for 425 students at the end of the second grade, who had 
received one year of the SFA® program. 

Ross and Casey (1998) examined the effects of SFA® in three 
schools in Ft. Wayne, Indiana, by comparing them with five 
schools that implemented “locally developed programs.” 

The WWC focused on students who started the program in 
kindergarten at two SFA® schools. The WWC based effective- 
ness ratings on the findings for 288 students at the end of first 
grade who received two years of SFA®. 



• Ross, McNelis, Lewis, & Loomis (1998) included 97 first-grade 
students from four elementary schools located in Little Rock, 
Arkansas. Two schools that implemented the SFA® were 
compared with two matched comparison schools that did not 
receive the intervention. The WWC based effectiveness rat- 
ings on findings at the end of the second grade after students 
received one year of SFA® implementation. 

• Smith, Ross, Faulks, Casey, Shapiro, & Johnson (1993) evalu- 
ated SFA® in two elementary schools in Ft. Wayne, Indiana, 
by comparing them with similar students in two matched 
comparison schools that did not receive SFA®. The WWC 
based effectiveness ratings on findings for 286 students 
spread across kindergarten and first grade who had received 
one year of SFA® implementation. 

Extent of evidence 

The WWC categorizes the extent of evidence in each domain as 
small or moderate to large (see the What Works Clearinghouse 
Extent of Evidence Categorization Scheme) . The extent of 
evidence takes into account the number of studies and the 
total sample size across the studies that met WWC evidence 
standards with or without reservations.^ 

The WWC considers the extent of evidence for SFA® to be 
moderate to large for alphabetics, comprehension, and general 
reading achievement. No studies that met WWC evidence stan- 
dards with or without reservations addressed fluency. 



6. The curriculum only intervention is a particular version of the SFA® program that only uses the beginning reading curriculum rather than the whole 
school reform approach (Slavin et al., 1990). The curriculum only portion of the study included only one school in comparison condition and did not 
meet WWC evidence screens. The dropout prevention portion met evidence standards with reservations but was not considered in the intervention 
rating because it went beyond the standard delivery of the program. However, results are reported in Appendices A4.7-A4.9. 

7. The Extent of Evidence Categorization was developed to tell readers how much evidence was used to determine the intervention rating, focusing on the 
number and size of studies. Additional factors associated with a related concept, external validity, such as the students’ demographics and the types of 
settings in which studies took place, are not taken into account for the categorization. 
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Findings 

The WWC beginning reading review addresses student 
outcomes in four domains: alphabetics, fluency, comprehen- 
sion, and generai reading achievement.® Studies included in 
this report cover three domains: alphabetics, comprehension, 
and generai reading achievement. Aiphabetics inciudes five 
constructs: phonemic awareness, phonoiogicai awareness, print 
awareness, letter knowledge, and phonics. Comprehension 
includes two constructs: reading comprehension and vocabulary 
development. General reading achievement includes outcome 
measures that do not explicitly differentiate among different 
reading domains (e.g., a summary standardized test score). 

The findings below present the authors’ estimates and WWC- 
caicuiated estimates of the size and the statisticai significance of 
the effects on students.® The resuits are presented by domain for 
each of the SFA® studies that met the WWC evidence standards 
with or without reservations. 

Alphabetics 

In the alphabetics domain, seven studies addressed phonics 
outcomes and one of these studies also measured students’ 
letter knowledge skills. 

Three years of program implementation: 

• Borman et al (2006) examined scores on the Woodcock Read- 
ing Mastery Test (WRMT) and reported statistically significant 
positive effects for two phonics subtests: Word Identification 
and Word Attack. The WWC analysis confirmed the statistical 
significance of these effects. 

• For each SFA® school,''® Madden et al. (1993) found statisti- 
cally significant positive effects on the phonics measure (the 
Woodcock Language Proficiency Battery (WLPB) Word Attack 



subtest) for preschoolers and first-graders and statistically 
significant positive effects on the WLPB Letter-Word Iden- 
tification subtest for kindergarteners. The WWC found that 
none of the combined effects across schools were statistically 
significant, but the average effect size across these outcomes 
was substantively important according to WWC criteria (that 
is, an effect size of at least 0.25). 

Two years of program implementation: 

• Dianda and Flaherty (1995) reported effect sizes, but did not 
report on the statistical significance of the effect of SFA® on 
two phonics measures: the WLPB Letter-Word Identification 
subtest and the Word Attack subtest. According to WWC 
calculations, there were no statistically significant effects of 
SFA®, but the average effect size across the two measures 
was positive and large enough to be considered substantively 
important. 

• Ross and Casey (1998) reported no statistically significant 
effect of SFA® for one phonics measure (WRMT Word Iden- 
tification subtest) but found a statistically significant positive 
effect for the other phonics measure (WRMT Word Attack 
subtest). In WWC computations, neither of the effects was 
statistically significant, and the average effect was not large 
enough to be considered substantively important. 

One year of program implementation: 

• Ross, Alberg, and McNelis (1997) did not find a statistically 
significant effect of SFA® for one phonics measure (the WRMT 
Word Identification subtest), but did find a statistically sig- 
nificant positive effect for the other phonics measure (WRMT 
Word Attack subtest). The WWC analyses showed that neither 
of the effects was statistically significant. In addition, the 
average effect size across the two outcomes was neither 



8. For definitions of the domains, see the Beginning Reading Protocoi . 

9. The ievei of statisticai significance was reported by the study authors or, where necessary, caicuiated by the WWC to correct for ciustering within 
ciassrooms or schoois and for muitipie comparisons. For an explanation, see the WWC Tutorial on Mismatch. See Technical Details of WWC-Conducted 
Computations for the formulas the WWC used to calculate the statistical significance. In the case of Success for AiF, a correction for multiple compari- 
sons was needed for Borman et al. (2006). In the case of the six other studies, corrections for clustering and multiple comparisons were needed. 

10. Two SFA® elementary schools were included in the analyses of third-year outcomes. 
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statistically significant nor large enough to be considered 
substantively Important. 

• Ross et al. (1998) study found no statistically significant 
effects of SFA® on the two phonics outcomes: WRMT Word 
Identification and Word Attack subtests. The WWC analyses 
also found that no effects were statistically significant, but the 
average effect size across outcomes was positive and large 
enough to be considered substantively important. 

• Smith et al. (1993) reported no statistically significant effect 
of SFA® on the letter knowledge construct (WRMT Letter 
Identification subtest), but found statistically significant 
positive effects for the two phonics outcomes (WRMT Word 
Identification and Word Attack subtests) for first-grade 
students. For kindergarten students, the authors found 
statistically significant positive effects for the WRMT Letter 
Identification and the Word Identification subtests. The 
WWC calculations found that although none of these effects 
were statistically significant, the average effect size across 
outcomes was positive and large enough to be substantively 
important. 

Overall, in the alphabetics domain, one study with a strong 
design showed statistically significant positive effects. Four 
studies showed substantively important positive effects and two 
studies showed indeterminate effects.^^ 

Comprehension 

In the comprehension domain, six studies addressed reading 
comprehension outcomes, and one of these studies also mea- 
sured students’ vocabulary development skills. 

Three years of program implementation: 

• Borman et al. (2006) reported and the WWC confirmed a 
statistically significant positive effect of SFA® on the WRMT 
Passage Comprehension subtest. 



Two years of program implementation: 

• Dianda and Flaherty (1995) did not report on the statistical 
significance of the effect of SFA® on the WLPB Passage 
Comprehension subtest. The WWC found no statistically 
significant effect, but the positive effect was large enough to 
be considered substantively important according to WWC 
criteria. 

• Ross and Casey (1998) reported no statistically significant 
effect of SFA® on the WRMT Passage Comprehension 
subtest. In addition, the WWC found that the effect size was 
positive, but not substantively important. 

One year of program implementation: 

• Ross, Alberg, and McNelis (1997) reported no statistically sig- 
nificant effect on the WRMT Passage Comprehension subtest 
and the WWC found that the effect size was positive, but not 
substantively important. 

• Ross et al. (1998) reported and the WWC confirmed a positive, 
but neither statistically significant nor substantively important 
effect of SFA® on the WRMT Passage Comprehension 
subtest. 

• Smith et al. (1993) reported no statistically significant effect 
of SFA® on the vocabulary development measure (Peabody 
Picture Vocabulary Test) for kindergarteners. For first-graders, 
the study authors found a statistically significant positive 
effect on the WRMT Passage Comprehension subtest. The 
WWC analysis found that none of the effects were statistically 
significant; and the average effect size across all outcomes 
was not large enough to be considered substantively 
important. 

For the comprehension domain, one study reported a statisti- 
cally significant positive effect and had a strong design. One 
study showed substantively important positive effects, and four 
studies showed indeterminate effects. 



11. Indeterminate effects are defined as effects that are not statistically significant and with effect sizes smalier than 0.25. 
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The WWC found Success 
for AlF to have potentially 
positive effects on 
alphabetics and general 
reading achievement 
domains and mixed effects 
on comprehension 



General reading achievement 

Six studies examined outcomes in the generai reading achieve- 
ment domain. 

Three years of program implementation: 

• Dianda and Fiaherty (1995) examined the effects of SFA® on 
the combined measure of WLPB and Durreil Oral Reading 
subtest for three cohorts of students after t\A/o to four years 
of program impiementation. The authors did not report on 
the statistical significance of the findings. The WWC effect 
size computations found that although none of the effects 
was statistically significant, the mean effect size across all 
outcomes was positive and large enough to be considered 
substantively important. 

• For each SFA® school, Madden et al. (1993) found statisti- 
cally significant positive effects of SFA® on the Durreil Oral 
Reading subtest for kindergarten and first-grade students. 
The WWC computations found that none of the positive 
effects combined across schools were statistically significant; 
but the mean effect across grade levels was large enough to 
be considered substantively important. 

Two years of program implementation: 

• Ross and Casey (1998) reported and the WWC confirmed a 
positive but neither statistically significant nor substantively 
important effect of SFA® on the Durreil Oral Reading subtest. 



One year of program implementation: 

• Ross, Alberg, and McNelis (1997) reported and the WWC 
confirmed a positive but neither statistically significant nor 
substantively important effect of SFA® on the Durreil Oral 
Reading subtest. 

• Smith et al. (1993) found a statistically significant positive effect 
of SFA® on the Durreil Oral Reading subtest. The WWC compu- 
tations found that the effect was not statistically significant, but 
large enough to be considered substantively important. 

• The Ross et al. (1998) reported and the WWC confirmed a 
positive, but neither statistically significant nor substantively 
important effect on the Durreil Oral Reading subtest. 

In the general reading domain, three studies reported sub- 
stantively important positive effects and three studies showed 
indeterminate effects. No study had a strong design. 

Rating of effectiveness 

The WWC rates the effects of an intervention in a given outcome 
domain as positive, potentially positive, mixed, no discernible 
effects, potentially negative, or negative. The rating of effective- 
ness takes into account four factors: the quality of the research 
design, the statistical significance of the findings, the size of 
the difference between participants in the intervention and the 
comparison conditions, and the consistency in findings across 
studies (see the WWC Intervention Rating Schemed . 



Improvement index 

The WWC computes an improvement index for each individual 
finding. In addition, within each outcome domain, the WWC 
computes an average improvement index for each study and an 
average improvement index across studies (see Technical Details 
of WWC-Conducted Computations) . The improvement index rep- 
resents the difference between the percentile rank of the average 
student in the intervention condition versus the percentile rank of 
the average student in the comparison condition. Unlike the rating 
of effectiveness, the improvement index is based entirely on the 
size of the effect, regardless of the statistical significance of the 



effect, the study design, or the analyses. The improvement index 
can take on values between -50 and +50, with positive numbers 
denoting results favorable to the intervention group. 

The average improvement index for alphabetics is +13 
percentile points across the seven studies, with a range of 0 to 
+32 percentile points across findings. The average improvement 
index for comprehension is +8 percentile points across the six 
studies, with a range of 0 to +17 percentile points across find- 
ings. The average improvement index for general reading is +10 
percentile points across the six studies, with a range of +2 to +22 
percentile points across findings. 
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Summary 

The WWC reviewed 74 studies on Success for All®. One of these 
studies met WWC evidence standards; six studies met WWC 
evidence standards with reservations; the remaining studies did 
not meet WWC evidence screens. Based on the seven studies, 



the WWC found potentially positive effects in the aiphabetics 
and general reading achievement domains, and mixed effects 
in the comprehension domain. The evidence presented in this 
report is based on avaiiable research and may change as new 
studies emerge. 
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12. The outcome measures are not relevant to this review: the parameters for this WWC review specified student outcome measures but this study did not 
focus on students. 

13. Does not use a strong causal design: this study was a quasi-experimental design but did not use achievement pretests to establish that the comparison 
group was equivalent to the intervention group at baseline. 

14. The sample is not appropriate to this review: the parameters for this WWC review specified that students should be in grades kindergarten through third 
grade during the time of the intervention; this study did not focus on the targeted grades. 

15. The study, which used a quasi-experimental design, reported an extreme overall attrition rate. 

16. Does not use a strong causal design: for the portion of the sample of interest to this WWC review, there was only one intervention and/or one compari- 
son unit, so the analysis could not separate the effects of the intervention from other factors. 

17. Does not use a strong causal design: this study did not use a comparison group. 

18. The sample is not appropriate to this review: the parameters for this WWC review specified that students should be in grades kindergarten through third 
grade; this study did not disaggregate students in the eligible range from those outside the range. 

19. The sample is not appropriate to this review: this study did not focus on students learning to read in English, one of the parameters for this WWC review. 

20. The sample is not appropriate to this review: the parameters for this WWC review specified student outcome measures, but this study did not focus on 
students. 

21 . Does not use a strong causal design: this study, which used a quasi-experimental design, did not use equating measures to ensure that the comparison 
group was equivalent to the intervention group. 

22. Does not use a strong causal design: for the portion of the sample of interest for this WWC review, there was a confound, with the intervention being 
modified or combined with other interventions, making it difficult to attribute study outcomes to the intervention. 

23. Does not use a strong causal design: this study, which used a quasi-experimental design, experienced attrition which led to possible bias in reporting. 

24. Does not use a strong causal design: for the portion of the sample of interest to this WWC review, there was only one intervention and one comparison 
unit, so the analysis could not separate the effects of the intervention from other factors. 

25. Does not use a strong causal design: this study was a quasi-experimental design but did not establish that the comparison group was equivalent to the 
intervention group at baseline. 

26. Confound: The effects of the intervention could not be separated from other factors; the impact of the agent of the intervention was confounded with the 
impact of the intervention. 
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Appendix 



Appendix A1.1 


Study characteristics: Borman, Siavin, Cheung, Chamheriain, Madden, & Chambers, 2006 (randomized controiied triai) 


Characteristic 


Description 


Study citation 


Borman, G. D., Siavin, R. E., Cheung, A., Chamberlain, A., Madden, N., & Chambers, B. (2006). Final reading outcomes of the national randomized field trial of Success for All. 
Retrieved from Success for All Web site: http://www.successforall.net/_images/pdfs/Third_Year_Results_06.doc 


Participants 


The study piloted the SFA® program in fall 2001, when three schools were randomly assigned to the SFA® and three schools to the comparison condition. In fall 2002, 35 
new schools were recruited with 18 schools randomly assigned to implement SFA® in grades K-2 and 17 schools randomly assigned to serve as comparisons.^ The study 
presented findings after the intervention students completed one, two, and three years of the program. For the effectiveness ratings, the WWC focused on findings from the 
longitudinal sample, that is, schools and students who completed three years of the program.^ After three years, 18 SFA® schools with 707 students and 17 comparison 
schools with 718 students remained in the longitudinal sample. 


Setting 


The analysis sample included 35 elementary schools across 14 states located in rural and small towns in the South and urban areas of the Midwest. 


Intervention 


Intervention students received the SFA® school reform program including the SFA® reading curriculum, tutoring for students’ quarterly assessments, family support teams for 
students’ parents, a facilitator who worked with school personnel, and training for all intervention teachers. Intervention schools implemented SFA® in grades K-2 and used 
their previously planned curriculum in grades 3-5. Some schools took a year to fully implement the program. 


Comparison 


Comparison schools continued using their regular, previously planned curriculum for grades K-2 (though SFA® was implemented in grades 3-5). Authors conducted 
observations at all schools and indicated that there was no evidence that when SFA® was implemented in grades 3-5, students in grades K-2 were also exposed to SFA®. All 
sample students were pretested with the Peabody Picture Vocabulary Test (PPVT) prior to SFA® implementation, and school-wide PPVT scores show equivalence between the 
program and comparison schools. Researchers also use information from the Common Core of Data (a database maintained by the National Center for Education Statistics) 
at several points over the course of the study to demonstrate the equivalence between the program and comparison schools on race/ethnicity, gender, English as a second 
language, special education, and free and reduced-price lunch. All equivalency tests were assessed at the school level and no statistically significant differences were found. 


Primary outcomes 
and measurement 


Three subtests of the Woodcock Reading Mastery Test were administered during the period reflected in the intervention rating: Word Identification, Word Attack, and Passage 
Comprehension.^ (See Appendices A2.1-2.3 for more detailed descriptions of outcome measures.) 


Teacher training 


SFA® teachers received three days of training during the summer and approximately eight days of on-site follow-up during the first implementation year. Success for All 
Foundation trainers visited classrooms, met with groups of teachers, looked at data on children’s progress, and provided feedback to school staff on implementation quality 
and outcomes. 



1 . The 17 additional comparison schools implemented SFA® in grades 3-5 but students in grades K-2— the focus of this study and the WWC review— did not receive the intervention. 

2. The study provided analysis for two samples, the “longitudinal sample” which included students who participated in the program for all three years, and the “in-mover sample” which included 
the longitudinal sample plus students who transferred into the school. The WWC analysis focuses on the longitudinal sample. The WWC prioritized outcomes that reflected students’ exposure to 
the intervention for the longest period of time available. Findings reflecting students’ outcomes after shorter periods of implementation can be found in Appendices A4.1-A4.9. 

3. One additional subtest of the Woodcock Reading Mastery Test (Letter Identification) was administered during an earlier time period and is presented as an additional finding in Appendix A4.1 
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Appendix A1.2 Study characteristics: Dianda & Fiaherty, 1995 (quasi-experimentai design) 



Characteristic 


Description 


Study citation 


Dianda, M., & Flaherty, J. (1995, April). Effects of Success for All on the reading achievement of first graders in California bilingual programs. Paper presented at the annual 
meeting of the American Educational Research Association, San Francisco, CA. 


Participants 


This study involved seven elementary schools in California where the majority of students were English language learners. Six schools remained by the third year of program 
implementation. Students were grouped into four language categories and received instruction in English, Spanish, or “Sheltered English.”^ Only the English-speaking sub- 
sample was reviewed. 2 The report includes three cohorts of students who began participating in the study as kindergarteners in 1992 (99 intervention and 120 comparison 
students), 1993 (105 intervention and 62 comparison students), or 1994 (94 intervention and 59 comparison students), for a total of 539 participants. Eor the effectiveness 
rating, the WWC used data that reflected students’ exposure to the intervention for the longest period of time, which varied for the different cohorts and domains.^ Exact 
attrition rates are not known for this study, however the post-attrition intervention and comparison samples were equivalent for the English speaking subgroup. In the overall 
sample, the percent of students eligible for free lunch varied from 70 to 98 in intervention schools, and from 47 to 80 in comparison schools. The percentages of minority 
students were between 50 and 70 for each study condition. 


Setting 


The analysis sample included seven elementary schools in California. 


Intervention 


Intervention students received the typical SEA® curriculum including the SEA® reading curriculum, tutoring for students, quarterly assessments, family support teams for 
students’ parents, a facilitator who worked with school personnel, and training for all intervention teachers. 


Comparison 


Comparison schools continued using their regular, previously planned curriculum. Each comparison school was matched with a SEA® school in the same district with students 
that had similar demographics and pretest scores on the Peabody Picture Vocabulary Test measure. 


Primary outcomes 
and measurement 


Three subtests of the Woodcock Language Proficiency Battery were administered: Letter-Word Identification, Word Attack, and Passage Comprehension. The authors pre- 
sented findings from each Woodcock subtest separately and also pooled findings from the Woodcock Letter-Word Identification subtests (see Appendices A2.1-2.3 for more 
detailed descriptions of outcome measures). 


Teacher training 


SEA® teachers received three days of training during the summer and approximately eight days of on-site follow-up during the first implementation year. Success for All 
Eoundation trainers visited classrooms, met with groups of teachers, looked at data on children’s progress, and provided feedback to school staff on implementation quality 
and outcomes. Specially trained certified teachers or qualified aides work one-to-one with the students. 



1. English language learners participate in SFA® in English alongside their English-dominant classmates during a common period in the morning. During the rest of the day, they receive sheltered- 
content instruction or ESL instruction, depending on their level of English proficiency. 

2. The WWC Beginning Reading topic focuses only on students learning to read in English (see Beginning Reading Protocol^ 

3. Findings include outcomes after two years of exposure for the alphabetics and comprehension domains; and after two (1994 cohort), three (1993 cohort), and four (1992 cohort) years of expo- 
sure for the general reading domain. Findings reflecting students’ outcomes after shorter periods of implementation can be found in Appendix A4.3. 



WWC Intervention Report Success for All® August 13, 2007 





Appendix A1.3 Study characteristics: Madden, N. A., Siavin, R. E., Karweit, N., Doian, L, & Wasik, B. A., 1993 (quasi-experimentai design) 



Characteristic 


Description 


Study citation 


Madden, N. A., Siavin, R. E., Karweit, N., Dolan, L., & Wasik, B. A. (1993). Success for All: Longitudinal effects of a restructuring program for inner-city elementary schools. 
American Educational Hesearch Journal, 30(1), 123-148. 


Participants 


The study investigated the effects of three versions of the SEA® program: full implementation, curriculum only,^ and dropout prevention.^ The WWC focused on the full 
implementation portion of the study, which included two intervention schools and two matched comparison schools. Within each comparison school, one third of the students 
were randomly selected fortesting purposes. The study focused on cohorts of students who started SEA® in pre-kindergarten, kindergarten, and first grade and received 
the intervention for multiple years. To determine the effectiveness ratings, the WWC focused on the latest term results available. The third-year analytic sample included 268 
students within two SEA® schools and 268 students within two comparison schools spread across three grade levels.^ African-American students constituted 97-99% of 
students in two intervention schools, with 83-97% of students qualified for free lunch. In comparison (Chapter 1) schools, at least 75% of students qualified for free lunch. 


Setting 


The analysis sample included four elementary schools in Baltimore, Maryland. 


Intervention 


Intervention students received the typical SEA® program including the SEA® reading curriculum, tutoring for students in grades 1-3, quarterly assessments, family support 
teams for students’ parents, a facilitator who worked with school personnel, and training for all intervention teachers. 


Comparison 


The comparison condition included schools that implemented a traditional reading program built around Macmillan Connections basal series. Each comparison school was 
matched with an intervention school based on the percentage of students getting free or reduced-price lunch and historical achievement level. Students were then individually 
matched on a standardized test given by the school district. Pretest scores on WRMT Letter-Word Identification, Word Attack, and Durrell Oral Reading subtests served as 
covariates in analyses. 


Primary outcomes 
and measurement 


Two subtests of the Woodcock Language Proficiency Battery were administered: Letter-Word Identification and Word Attack. Additional measures included Durrell Analysis of 
Reading Difficulty Silent Reading and Oral Reading subtests and the California Achievement Test (CAT) Total Reading (see Appendices A2.1-2.3 for more detailed descriptions 
of outcome measures). 


Teacher training 


The teachers and tutors were regular certified teachers. They received detailed teacher's manuals supplemented by two to three days of in-service at the beginning of the 
school year. For teachers of grades 1-3 and for reading tutors, these training sessions focused on the implementation of the reading program. Preschool and kindergarten 
teachers and aids were trained in the use of the thematic units, and other aspects of the preschool and kindergarten models. School facilitators also organized many informa- 
tion sessions to allow teachers to share problems and solutions, suggest changes, and discuss individual children. 



1. The curriculum-only portion (a version of the SFA® program that only uses the beginning reading curriculum rather than the whole school reform) of the study included only one school in the 
comparison condition making it impossible to separate the effect of the school from the effect of the regular reading curriculum. 

2. The dropout prevention version was designed to operate within schools that do not have the funding to implement the full SE4® program. The dropout prevention program had a reduced number 
of tutors and family support staff. Chapter 1 monies supported the program. The dropout prevention portion is not included in the intervention rating because it differs from the standard imple- 
mentation of the program. However, findings for the dropout prevention portion of SFA® can be found in Appendices A4.7-4.9 

3. Additional findings reflecting students’ outcomes after shorter periods of implementation can be found in Appendices A4.1-A4.9, along with findings for a subsample of low-achieving students. 
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Appendix A1.4 Study characteristics: Ross, Aiberg, & McNeiis, 1997 (quasi-experimentai design) 



Characteristic 


Description 


Study citation 


Ross, S. M., Aiberg, M., & McNeiis, M. (1997). Evaluation of elementary school school-wide programs: Clover Park School District Year 1: 1996-97. Memphis, TN: The 
University of Memphis, Center for Research in Educafion Policy. 


Participants 


The study compared whole-school improvement programs. Success forAlP, Accelerated Schools, and locally-developed programs, in 19 schools. Schools were divided into 
four groups based on the similarity of several school characteristics, including enrollment, percentage of minorify students, percentage of sfudents eligible for free/reduced 
lunch, and inifial academic performance. WWC focused on only one group, “cluster 2A”, the third highest with respect to socio-economic status, which included three SFA® 
schools and three Accelerated Schools, with a total number of 252 first-grade students (148 students that attended SFA® schools; 104 students that attended Accelerated 
Schools).^ The study included data that reflected students’ outcomes after one year of program implementation. In the overall sample, the percent of minority students in three 
intervention schools was between 47 and 63. In three the comparison schools, the range was between 42 and 54%. The percent of students eligible for free/reduced lunch 
varied from 63 to 66 in intervention schools, and from 66 to 71 in comparison schools. 


Setting 


The analysis sample included six elementary schools in Clover Park, Washington. 


Intervention 


Intervention students received the typical SFA® program including the SFA® reading curriculum, tutoring for students in grades 1-3, quarterly assessments, family support 
teams for students’ parents, a facilitator who worked with school personnel, and training for all intervention teachers. 


Comparison 


Accelerated Schools is a comprehensive school reform program that is designed to close the achievement gap between at-risk and not at-risk children. The program rede- 
signs and integrates curricular, instructional, and organizational practices so that they provide enrichment for at-risk students. 


Primary outcomes 
and measurement 


Three subtests of the Woodcock Reading Mastery Test were administered: Word Identification, Word Attack, and Passage Comprehension. The Durrell Analysis of Reading 
Difficulty Oral Reading subtest was also used (see Appendices A2.1-2.3 for more detailed descriptions of outcome measures). 


Teacher training 


No information on training for the specific teachers in this study was provided. 



1. An additional group included one SFA® school and three comparison schools (one school used Accelerated Schools design, and the other two locally developed programs), but this comparison 
did not meet WWC evidence screens because the effect of SFA® cannot be separated from the effect of that school. 
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Appendix A1.5 Study characteristics: Ross & Casey, 1998 (quasi-experimentai design) 



Characteristic 


Description 


Study citation 


Ross, S. M., & Casey, J. (1998). Longitudinal study of student literacy achievement in different Title 1 school-wide programs in Fort Wayne Community Schools year 2: First 
grade results. Memphis, TN: The University of Memphis, Center for Research in Education Policy. 


Participants 


This study examines the effects of SFA® in two Titie i schoois by comparing them with five other Title 1 schools that were implementing locally developed school-wide 
programs.^ The study did not report on the initial sample size, but 288 students in kindergarten (83 students in the SFA® schools; 205 students at comparison schools) were 
included in the final analysis sample and the post-attrition intervention and comparison samples were equivalent on the achievement pretest measure (PPVT). The study 
included data that reflected students’ outcomes after two years of program implementation.^ School populations ranged between 31 and 50% minority enrollment; between 
62 and 81% of students received free or reduced-price lunch. 


Setting 


The analysis sample included seven Title 1 elementary schools in Fort Wayne, Indiana. 


Intervention 


Intervention students received the typical SFA® curriculum including the Reading Roots reading curriculum in grade 1 and the Reading Wings reading curriculum in grade 2; 
one-to-one tutoring for the lowest-achieving students by certified teacher tutors, quarterly assessments, family support teams for students’ parents, a facilitator who worked 
with school personnel, and training for all intervention teachers. 


Comparison 


The five comparison schools implemented locally developed school-wide programs. The schools were comparable with SFA® schools on pretest PPVT measures, socio- 
economic status, and ethnicity. Four out of the five local school programs incorporate components of other branded programs, including Reading Recovery, Accelerated 
Reader, Four-Block, and STAR. These curricula place considerable emphasis on reading, use of basal readers, and multi-faceted reading activities. 


Primary outcomes 
and measurement 


Three subtests of the Woodcock Reading Mastery Test were administered: Word Identification, Word Attack, and Passage Comprehension. The study presented a combined 
measure of Word Identification and Word Attack. The Durrell Analysis of Reading Difficulty Oral Reading subtest was also used (see Appendices A2.1-2.3 for more detailed 
descriptions of outcome measures). 


Teacher training 


No information on training for the specific teachers was provided in this study. 



1. The article reported on an additional intervention school that supplemented SF/\® with another branded intervention (Reading Recovery), but results from this portion of the study do not meet 
WWC evidence standards because the effect of SFA® cannot be separated from the effect of Reading Recovery. 

2. Additional findings for a subsample of low-achieving students (i.e., lowest 25% with respect to reading achievement) are reported in Appendices A4.1-A4.9. 
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Appendix A1.6 Study characteristics: Ross, McNeiis, Lewis, & Loomis, 1998 (quasi-experimentai design) 



Characteristic 


Description 


Study citation 


Ross, S. M., McNeiis, M., Lewis, T., & Loomis, S. (1998). Evaluation of Success for All programs: Little Rock school district year 1: 1997-1998. Memphis, TN: The University of 
Memphis, Center for Research in Education Policy. 


Participants 


This study involved 97 first-grade students with both pretest and posttest data in four schools. Two schools implemented the Success forAlP program (40 students) and two 
schools were selected as their matched comparison schools (47 students). The SEA® schools and the comparison schools were similar in poverty level, achievement level, and 
enrollment. The study reported data on students' outcomes after one year ot program implementation. 


Setting 


The study took place in four elementary schools in Little Rock, Arkansas. 


Intervention 


Intervention students received the typical SEA® program including the SEA® reading curriculum, tutoring for students in grades 1-3, quarterly assessments, tamily support 
teams for students’ parents, a facilitator who worked with school personnel, and training for all intervention teachers. 


Comparison 


No information was provided on the nature ot the comparison curriculum. The two comparison schools were matched to the SEA® schools based on poverty level, achieve- 
ment level, and enrollment. Pretest PPVT scores were used as a covariate to adjust tor ditferences in students’ abilities. 


Primary outcomes 
and measurement 


Three subtests of the Woodcock Reading Mastery Test were administered: Word Identification, Word Attack, and Passage Comprehension. The Durrell Analysis of Reading 
Difficulty Oral Reading subtest was also used (see Appendices A2.1-2.3 tor more detailed descriptions of outcome measures). 


Teacher training 


No information on training tor the teachers in this study was provided. 
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Appendix A1.7 Study characteristics: Smith, Ross, Fauiks, Casey, Shapiro, & Johnson, 1993 (quasi-experimentai design) 



Characteristic 


Description 


Study citation 


Smith, L. J., Ross, S. M., Faulks, A., Casey, J., Shapiro, M., & Johnson, B. (1993). 1991-1992 Ft. Wayne, Indiana SFA resuits. Memphis, TN: The University of Memphis, 
Center for Research in Education Policy. 


Participants 


This study involved approximately 286 students in kindergarten and first grade in four elementary schools in Fort Wayne, Indiana. Two schools implemented the SFA® 
program. Two comparison schools were matched to the intervention schools based on poverty level, historical achievement level, and ethnicity; then pairs of students were 
matched on PPVT pretest scores. There were 74 kindergarteners and 69 first-grade students in the intervention group and 74 kindergarteners and 69 first-grade students in 
the comparison group. Exact student attrition rates are not known for this study; however, the post-attrition intervention and comparison samples were equivalent on achieve- 
ment pretest. School level data — poverty level, achievement, and enrollment — were similar across all schools. The study included data on students' outcomes after one year 
of program implementation.^ 


Setting 


The study took place in four elementary schools in Fort Wayne, Indiana. 


Intervention 


Intervention students received the typical SFA® program including the SFA® reading curriculum, tutoring for students, quarterly assessments, family support teams for 
students’ parents, a facilitator who worked with school personnel, and training for all intervention teachers. 


Comparison 


Comparison schools continued using their regular, previously planned curriculum. No other information was provided on the comparison curriculum. 


Primary outcomes 
and measurement 


Four subtests of the Woodcock Reading Mastery Test were used: Letter Identification, Word Identification, Word Attack, and Passage Comprehension. Additional measures 
included the Peabody Picture Vocabulary Test and Durrell Analysis of Reading Difficulty Oral Reading subtest. The Merrill Language Screening Test and the Test of Language 
Development were also administered, but have not been included in this review because they were outside the scope of the Beginning Reading review (see Appendices 
A2.1-2.3 for more detailed descriptions of outcome measures). 


Teacher training 


Teachers in their first year of teaching SFA® classes received three days of summer training and two to four additional in-service days during the school year. A school facilita- 
tor monitored and provided feedback throughout the year. Twice a year, trainers provided by the developer visited and observed teachers. After the first year, training was 
reinforced by regular in-services, an annual SFA® conference, and implementation checks for the facilitators and trainers. 



1 . Additional findings for a low-achieving subset of students (lowest 25% with respect to reading achievement) are presented in Appendices A41-A4.9. 
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Appendix A2.1 Outcome measures in the aiphabetics domain by construct 



Outcome measure 


Description 


Letter knowledge 

Woodcock Reading Mastery Test 
(WRMT): Letter Identification 
subtest 


The standardized test measures the number of letters that students are able to identify correctly (Smith et al., 1993). 


Phonics 

WRMT: Word Identification 
subtest 


The Word Identification subtest is a test of decoding skills. The standardized test requires the child to read aloud isolated real words that range in frequency and difficulty (as 
cited in Borman et al., 2006; Ross & Casey, 1998; Ross, Alberg, & McNelis, 1997; Ross et al., 1998; Smith et al., 1993). 


Woodcock Language Proficiency 
Battery (WLPB): Letter-Word 
Identification subtest 


The Letter/Word Identification subtest is a standardized test that requires the child to read aloud isolated letters and real words that range in frequency and difficulty (as cited 
in Dianda & Flaherty, 1995, and Madden et al., 1993). 


WRMT and WLPB: Word Attack 
subtest 


The standardized test measures phonemic decoding skills by asking students to read pseudowords. Students are aware that the words are not real (as cited in Borman et al., 
2006; Dianda & Flaherty, 1995; Ross & Casey, 1998; Ross, Alberg, & McNelis, 1997; Ross et al., 1998; Madden et al., 1993; Smith et al., 1993). 



Appendix A2.2 Outcome measures in the comprehension domain by construct 



Outcome measure 


Description 


Reading comprehension 

WRMT and WLPB; Passage 
Comprehension subtest 


In this standardized test, comprehension is measured by having students fill in missing words in a short paragraph (as cited in Borman et al., 2006; Dianda & Flaherty, 1995; 
Ross & Casey, 1998; Ross, Alberg, & McNelis, 1997; Ross etal., 1998; Smith et al., 1993). 


Durrell Analysis of Reading 
Difficulty (DARD): Silent Reading 
Test 


An individually-administered, standardized diagnostic test that measures reading rate while students read passages silently and answer comprehension questions (as cited in 
Madden et al., 1993). 


Vocabulary development 

Peabody Picture Vocabulary Test 
(PPVT) 


A standardized, receptive vocabulary test that asks students to choose which one of four pictures corresponds to a test word spoken aloud (as cited in Smith et al., 1993). 
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Appendix A2.3 Outcome measures in the generai reading domain by construct 



Outcome measure 


Description 


California Achievement Test 
(CAT) Total Reading 


A group-administered, standardized assessment battery comprised of numerous reading and language-oriented subtests (as cited in Madden et al., 1993). 


CARD Oral Reading Test 


An individually administered, standardized diagnostic test that measures reading accuracy, reading rate, and oral reading comprehension (as cited in Ross, Albert, & McNelis, 
1997; Ross & Casey, 1998; Ross et al., 1998; Madden et al., 1993; Smith et al., 1993). 
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Appendix A3.1 



Summary of findings for aii domains^ 



Domain 


Alphabetics 

Outcome measure Letter knowledge Phonics 


Comprehension 

Reading Vocabulary 

comprehension development 


General reading 
achievement 



Met evidence standards 

Borman et al., 2006 


nr 


+ 


+ 


nr 


nr 


Met evidence standards with reservations 










Dianda & Flaherty, 1995 


nr 


(+) 


(+) 


nr 


(+) 


Madden et at, 1993 


nr 


(+) 


nr 


nr 


(+) 


Ross, Alberg, & McNeils, 1997 


nr 


Ind 


Ind 


nr 


Ind 


Ross & Casey, 1998 


nr 


Ind 


Ind 


nr 


Ind 


Ross et al., 1998 


nr 


(+) 


Ind 


nr 


Ind 


Smith etal., 1993 


{+) 


(+) 


(+) 


Ind 


(+) 


Rating of effectiveness 


Potentially positive 


Mixed effects 


Potentially positive 



nr = no reported outcomes under this construct 
+ = study average finding was positive and statisticaily significant 

(+) = study average finding was positive and substantiveiy important, but not statisticaiiy significant 

ind = study average finding was indeterminate, that is, neither substantiveiy important nor statisticaiiy significant 

1 . This appendix reports summary findings of study averages that were considered for the effectiveness rating and the improvement index in each domain. More detailed information on findings for all measures within the domains and the 
constructs that factor into the domains can be found in Appendices A3.2-A3.4. 
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Appendix A3.2 Summary of findings for aiphabetics domain^ 











Authors’ findings from the study 
















Mean outcome 
(standard deviation^) 




WWC calculations 




Outcome measure 


Construct 


Study 

sample^ 


Sample size 
(schools/ 
students) 


Success 

forAlP Comparison 

group group 


Mean 

difference’' 
(SFA® - 
comparison) 


Statistical 
significance^ 
Effect size® (at a = 0.05) 


Improvement 

index^ 







Borman et al., 2006 (randomized controlled trial)®— 


-Three years of intervention 








WRMT: Word ID subtest^* 


Phonics 


Kindergarten 


35/1,425 


462,96 

(23.56) 


457.41 

(25.72) 


5.55 


0,22 


Statistically 

significant 


+9 


WRMT: Word Attack subtest® 


Phonics 


Kindergarten 


35/1,425 


493.43 

(16,45) 


487.73 

(17,64) 


5.70 


0,33 


Statistically 

significant 


-Hi 3 






Madden et al., 1993 (quasi experimental design)® '®- 


-Three years of intervention 








WLPB): Letter-Word ID subtest 


Phonics 


Pre-kindergarten 
(Cohort 1) 


4/210 


18.25 

(5.20) 


16.10 

(6.69) 


2,14 


0,36 


ns 


-h14 


WLPB: Word Attack subtest 


Phonics 


Pre-kindergarten 
(Cohort 1) 


4/210 


5.41 

(4.25) 


2.29 

(3,55) 


3.12 


0,79 


ns 


-h29 


WLPB: Letter-Word ID subtest 


Phonics 


Kindergarten 
(Cohort 2) 


4/148 


24.50 

(5.93) 


21,08 

(6.61) 


3,42 


0,54 


ns 


-h21 


WLPB: Word Attack subtest 


Phonics 


Kindergarten 
(Cohort 2) 


4/148 


7,74 

(6.00) 


5,67 

(4,69) 


2.08 


0.38 


ns 


-Hi 5 


WLPB: Letter-Word ID subtest 


Phonics 


Grade 1 
(Cohort 3) 


4/178 


28.09 

(7.30) 


25,28 

(5.97) 


2.81 


0.42 


ns 


-Hie 


WLPB: Word Attack subtest 


Phonics 


Grade 1 
(Cohort 3) 


4/178 


11.47 

(7.40) 


6.52 

(4.87) 


4.95 


0,79 


ns 


-Hi 8 






Dianda & Flaherty, 1995 (quasi experimental design) 


® — ^Two years of intervention 








WLBP: Letter-Word ID subtest 


Phonics 


English-speaking 
kindergarten 
(1992 cohort) 


7/219 


nr 


nr 


na 


0.34" 


ns 


-Hi 3 


WLBP: Word Attack subtest 


Phonics 


English-speaking 
kindergarten 
(1992 cohort) 


7/219 


nr 


nr 


na 


0,26" 


ns 


-HlO 



(continued) 
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Appendix A3.2 Summary of findings for aiphabetics domain^ (continued) 











Authors’ findings from the study 
















Mean outcome 
(standard deviation^) 




WWC calculations 




Outcome measure 


Construct 


Study 

sample^ 


Sample size 
(schools/ 
students) 


Success 

forAlP Comparison 

group group 


Mean 

difference’' 
(SFA® - 
comparison) 


Statistical 
significance^ 
Effect size® (at a = 0.05) 


Improvement 

index^ 







Ross & Casey, 1998 (quasi experimental design)®- 


—Two years of intervention 








WRMT: Word ID subtest 


Phonics 


Kindergarten 


7/288 


32,14 

(14.63) 


31.30 

(14.20) 


0,84 


0.06 


ns 


+2 


WRMT: Word Attack subtest 


Phonics 


Kindergarten 


7/288 


12,25 

(7,36) 


10.40 

(8.20) 


1.85 


0.23 


ns 


+9 






Ross, Alberg, & McNelis, 1997 (quasi experimental design)® — < 


One year of intervention 








WRMT: Word ID subtest 


Phonics 


Grade 1 


6/252 


nr 


nr 


na 


C\J 

O 

O 

1 


ns 


0 


WRMT: Word Attack subtest 


Phonics 


Grade 1 


6/252 


18,35 


15,86 


2.49 

(8.89)'® 


0.28'® 


ns 


+11 






Ross et al., 1998 (quasi experimental design)®- 


-One year of intervention 








WRMT) Word ID subtest 


Phonics 


Grade 1 


4/97 


38.27 


36.21 


2.06 

(12,31)'’' 


0,17 


ns 


+7 


WRMT: Word Attack subtest 


Phonics 


Grade 1 


4/97 


15,17 


11.19 


3.98 

(8.89)'’' 


0.44 


ns 


+17 






Smith et al., 1993 (quasi experimental design)®- 


-One year of intervention 








WRMT: Word ID subtest 


Phonics 


Kindergarten 
(Cohort 1) 


4/148 


10,26 

(9.82) 


3.15 

(4.95) 


7.11 


0.91 


ns 


+32 


WRMT: Letter ID subtest 


Letter 

Knowledge 


Kindergarten 
(Cohort 1) 


4/148 


32.43 

(4.28) 


29,36 

(7.81) 


3,07 


0.48 


ns 


+19 


WRMT: Letter ID subtest® 


Letter 

Knowledge 


Grade 1 
(Cohort 2) 


4/138 


nr 


nr 


na 


0.08" 


ns 


+3 


WRMT: Word ID subtest 


Phonics 


Grade 1 
(Cohort 2) 


4/138 


35.04 

(10,63) 


28,00 

(14.70) 


7.04 


0,55 


ns 


+21 


WRMT: Word Attack subtest 


Phonics 


Grade 1 
(Cohort 2) 


4/138 


12,60 

(7.43) 


7.90 

(7,91) 


4,70 


0.61 


ns 


+23 



(continued) 
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Appendix A3.2 Summary of findings for aiphabetics domain^ (continued) 











Authors’ findings from the study 
















Mean outcome 
(standard deviation^) 




WWC calculations 




Outcome measure 


Construct 


Study 

sample^ 


Sample size 
(schools/ 
students) 


Success 

forAlP Comparison 

group group 


Mean 

difference’' 
(SFA® - 
comparison) 


Statistical 
significance^ 
Effect size® (at a = 0.05) 


Improvement 

index^ 



Averages for aiphabetics'® 

Borman et al., 2006 — Three years of intervention 


0.28 


Statistically 

significant 


+11 


Madden et al., 1993 — ^Three years of intervention 


0.55 


ns 


+21 


Dianda & Flaherty, 1995 — Two years of intervention 


0.30 


ns 


+12 


Ross & Casey, 1998 — Two years ot intervention 


0.14 


ns 


+6 


Ross, Alberg, & McNelis, 1997 — One year of intervention 


0.13 


ns 


+5 


Ross et al., 1998 — One year of intervention 


0.31 


ns 


+12 


Smith et al., 1993 — One year of intervention 


0.56 


ns 


+21 


Domain average for aiphabetics across all studies 


0.32 


na 


+13 


Averages by years of SFA® implementation 

Average of results from studies with three years of intervention (two studies) 


0.38 


na 


+15 


Average of results from studies with two years of intervention (two studies) 


0.22 


na 


+9 


Average of results from studies with one year of intervention (three studies) 


0.33 


na 


+13 



na = not applicable 

nr = not reported 

ns = not statistically significant 

1 . This appendix reports findings considered for the effectiveness rating and the average improvement indices. Eariier findings from iongitudinai studies are not inciuded in these ratings, but are reported in Appendix A4.1 . Subgroup find- 
ings from the studies are not inciuded in these ratings, but are reported in Appendix A4.4. 

2. The standard deviation across aii students in each group shows how dispersed the participants’ outcomes are: a smaiier standard deviation on a given measure wouid indicate that participants had more simiiar outcomes. 

3. The cohort is defined by the time pretest is administered. For exampie, kindergarten cohort describes students who compieted pretest measures in kindergarten. 

4. Positive differences and effect sizes favor the intervention group; negative differences and effect sizes favor the comparison group. 

5. For an expianation of the effect size caicuiation, see lechnicai Details of WWC-Conducted Computations . 

6. Statistical significance is the probability that the difference between groups is a result of chance rather than a real difference between the groups. 

7. The improvement index represents the difference between the percentile rank of the average student in the intervention condition versus the percentile rank of the average student in the comparison condition. The improvement index 
can take on values between -50 and +50, with positive numbers denoting results favorable to the intervention group. 

8. The level of statistical significance was reported by the study authors or, where necessary, calculated by the WWC to correct for clustering within classrooms or schools and for multiple comparisons. For an explanation about the 

clustering correction, see the WWC Tutorial on Mismatch . See Technical Details of WWC-Conducted Computations for the formulas the WWC used to calculate statistical significance. In the case of Borman et al. (2006), a correction for 
multiple comparisons was needed so the significance levels may differ from those reported in the original study. There was no need to adjust for clustering because the findings were based on HLM analyses. In the case of the six other 
studies, corrections for both clustering and multiple comparisons were needed so the significance levels may differ from those reported in the original studies. (continued) 
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Appendix A3.2 Summary of findings for aiphabetics domain^ (continued) 



9. Standard deviations and adjusted means have been received through communication with the author (G. Borman, personai communication, 2006). 

10. WWC combined means and standard deviations for two SF4® schoois (Abbottston and City Springs) and their counterparts. Adjusted posttest means (with pretests standard scores as covariates) were used for effect size caicuiations. 
Kindergarten and grade 1 cohorts from Abbottston eiementary schooi received four years of intervention. 

1 1 . Authors reported effect sizes that used comparison group standard deviation in the denominator (Giass's deita). Effect size was computed by subtracting the comparison group mean from the intervention group mean and dividing the 
resuit by the comparison group standard deviation. 

12. Authors reported effect sizes adjusted for PPVT pretest scores. 

13. The WWC derived pooied standard deviation from the reported means and effect size. 

14. Authors reported pooied standard deviation. 

15. The WWC-computed average effect sizes for each study and for the domain across studies are simpie averages rounded to two decimai piaces. The average improvement indices are caicuiated from the average effect sizes. 
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Appendix A3.3 Summary of findings for comprehension domain^ 











Authors’ findings from the study 
















Mean outcome 
(standard deviation^) 




WWC calculations 




Outcome measure 


Construct 


Study 

sample^ 


Sample size 
(schools/ 
students) 


Success 

forAlP Comparison 

group group 


Mean 

difference’' 
(SFA® - 
comparison) 


Statistical 
significance^ 
Effect size® (at a = 0.05) 


Improvement 

index^ 







Borman et al., 2006 (randomized controlled trial)®— 


-Three years of intervention 








WRMT: Passage 
Comprehension subtest® 


Reading 

comprehension 


Kindergarten 35/1,425 481.41 

(14.20) 


478.33 3.08 

(15.33) 


0.21 


Statistically 

significant 


+8 






Dianda & Flaherty, 1995 (quasi-experimental design) 


® — ^Two years of intervention 








WLPB: Passage 
Comprehension subtest 


Reading 

comprehension 


English-speaking 7/219 nr 

kindergarten 
(1992 cohort) 


nr na 


0.44 


ns 


-Hi 7 






Ross & Casey, 1998 (quasi-experimental design)®- 


-Two years of intervention 








WRMT: Passage 
Comprehension subtest 


Reading 

comprehension 


Kindergarten 7/288 16.09 

(8.46) 


15.40 0.69 

(8.70) 


0.08 


ns 


-h3 






Ross, Alberg, & McNelis, 1997 (quasi-experimental design)® — One year of intervention 








WRMT: Passage 
Comprehension subtest 


Reading 

comprehension 


Grade 1 6/252 nr 


nr na 


O.OT' 


ns 


0 






Ross et al., 1998 (quasi-experimental design)®— 


-One year of intervention 








WRMT: Passage 
Comprehension subtest 


Reading 

comprehension 


Grade 1 4/97 19.19 


17.73 1.46 

(8.19)'2 


0.18 


ns 


+7 






Smith et al., 1993 (quasi-experimental design)®- 


-One year of intervention 








Peabody Picture Vocabulary Test 


Vocabulary 

development 


Kindergarten 4/148 nr 

(Cohort 1) 


nr na 


0.17'« 


ns 


+7 


WRMT: Passage 
Comprehension subtest 


Reading 

comprehension 


Grade 1 4/136 16.37 

(Cohort 2) (8.07) 


13.91 2.46 

(9.31) 


0.28 


ns 


-Hll 


Averages for comprehension'® 

Borman et al., 2006 — Three years of intervention 






0.21 


Statistically 

significant 


-h8 


Dianda & Flaherty, 1995 — Two years of intervention 






0.44 


ns 


-Hi 7 


Ross & Casey, 1998 — Two years of intervention 






0.08 


ns 


-h3 



(continued) 
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Appendix A3.3 Summary of findings for comprehension domain^ (continued) 











Authors’ findings from the study 
















Mean outcome 
(standard deviation^) 




WWC calculations 




Outcome measure 


Construct 


Study 

sample^ 


Sample size 
(schools/ 
students) 


Success 

forAlP Comparison 

group group 


Mean 

difference’' 
(SFA® - 
comparison) 


Statistical 
significance^ 
Effect size® (at a = 0.05) 


Improvement 

index^ 



Ross, Alberg, & McNelis, 1997 — One year of intervention 


0.01 


ns 


0 


Ross et al., 1998 — One year of intervention 


0.18 


ns 


+7 


Smith et al., 1993 — One year of intervention 


0.23 


ns 


+9 


Oomain average for comprehension across all studies 


0.19 


na 


+8 


Averages by years of SFA® implementation; 

Results from study with three years of intervention (one study) 


0.21 


Statistically 

significant 


+8 


Average of results from studies with two years of intervention (two studies) 


0.26 


na 


-HlO 


Average of results from studies with one year of intervention (three studies) 


0.14 


na 


+6 



na = not applicable 

nr = not reported 

ns = not statistically significant 

1 . This appendix reports findings considered for the effectiveness rating and the average improvement indices. Eariier findings from iongitudinai studies are not inciuded in these ratings, but are reported in Appendix A4.2. Subgroup find- 
ings from the studies are not inciuded in these ratings, but are reported in Appendix A4.5 

2. The standard deviation across aii students in each group shows how dispersed the participants’ outcomes are: a smaiier standard deviation on a given measure wouid indicate that participants had more simiiar outcomes. 

3. The cohort is defined by the time pretest is administered. For exampie, kindergarten cohort describes students who compieted pretest measures in kindergarten. 

4. Positive differences and effect sizes favor the intervention group; negative differences and effect sizes favor the comparison group. 

5. For an expianation of the effect size caicuiation, see Technical Details of WWC-Conducted Computations . 

6. Statistical significance is the probability that the difference between groups is a result of chance rather than a real difference between the groups. 

7. The improvement index represents the difference between the percentile rank of the average student in the intervention condition versus the percentile rank of the average student in the comparison condition. The improvement index 
can take on values between -50 and -t-50, with positive numbers denoting results favorable to the intervention group. 

8. The level of statistical significance was reported by the study authors or, where necessary, calculated by the WWC to correct for clustering within classrooms or schools and for multiple comparisons. For an explanation about the clus- 
tering correction, see the WWC Tutorial on Mismatch . See Technical Details of WWC-Conducted Computations for the formulas the WWC used to calculate statistical significance. In the case of Borman et al. (2006), there was no need 
to adjust for clustering because the findings were based on HLM analyses. In the case of Dianda and Flaherty (1995), Ross & Casey (1998), Ross, Alberg, & McNelis (1997), and Ross et al. (1998), a correction for clustering was needed 
so the significance levels may differ from those reported in the original study. In the case of Smith et al. (1993), correction for both clustering and multiple comparisons were needed so the significance levels may differ from those 
reported in the original studies. 

9. Standard deviations and adjusted means have been received through communication with the author. 

10. Authors reported effect sizes that used comparison group standard deviation in the denominator (Glass's delta). Effect size was computed by subtracting the comparison group mean from the intervention group mean and dividing the 
result by the comparison group standard deviation. 

11. Authors reported effect sizes adjusted for PPVT pretest scores. 

12. Authors reported pooled standard deviation. 

13. The WWC-computed average effect sizes for each study and for the domain across studies are simple averages rounded to two decimal places. The average improvement indices are calculated from the average effect sizes. 
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Appendix A3.4 Summary of findings for generai reading achievement domain^ 



Authors’ findings from the study 
Mean outcome 

(standard deviation^) WWC calculations 















Mean 














Sample size 


Success 




difference’' 




Statistical 








Study 


(schools/ 


forAlP 


Comparison 


(SFA® - 




significance® 


Improvement 


Outcome measure 


Construct 


sample^ 


students) 


group 


group 


comparison) 


Effect size® 


(at a = 0.05) 


index^ 



Four years of intervention 

3 WLPB subtests and Durrell 
Reading subtest combined 


General reading 


Dianda & Flaherty, 1995 (quasi-experimental design)® ® 

English-speaking 6/136 nr nr na 

kindergarten 
(1992 cohort) 


0,23'® 


ns 


+9 


Three years of intervention 

3 WLPB subtests and Durrell 
Reading subtest combined 


General reading 


English-speaking 
kindergarten 
(1993 cohort) 


6/167 


nr 


nr 


na 


0.34'“ 


ns 


-Hi 3 


Two years of intervention 

3 WLPB subtests and Durrell 
Reading subtest combined 


General reading 


English-speaking 
kindergarten 
(1994 cohort) 


6/153 


nr 


nr 


na 


0,27'“ 


ns 


-Hll 






Madden et al., 1993 (quasi-experimental design)® '' — Three years of intervention 








Durreii Orai Reading subtest 


General reading 


Pre-kindergarten 


4/210 


5.45 


4.46 


0.99 


0,19 


ns 


-h8 






(Cohort 1) 




(4,73) 


(5.58) 










Durrell Oral Reading subtest 


General reading 


Kindergarten 


4/148 


12,35 


8.51 


3.84 


0,58 


ns 


-h22 






(Cohort 2) 




(7,77) 


(5.06) 










Durrell Oral Reading subtest 


General reading 


Grade 1 


4/178 


16.74 


12,92 


3.82 


0,54 


ns 


-h21 






(Cohort 3) 




(7.07) 


(6.99) 














Ross & Casey, 1998 (quasi-experimental design)® — Two years of intervention 








Durrell Oral Reading subtest 


General reading 


Kindergarten 


7/288 


5.35 


4.7 0 


0.65 


0,15 


ns 


+6 










(4,63) 


(4.30) 














Ross, Alberg, & McNelis, 1997 (quasi-experimental design)®— 


■One year of intervention 






Durrell Oral Reading subtest 


General reading 


Grade 1 


6/252 


nr 


nr 


na 


0.04'2 


ns 


+2 



(continued) 
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Appendix A3.4 Summary of findings for generai reading achievement domain^ (continued) 











Authors’ findings from the study 
















Mean outcome 
(standard deviation^) 




WWC calculations 




Outcome measure 


Construct 


Study 

sample^ 


Sample size 
(schools/ 
students) 


Success 

forAlP Comparison 

group group 


Mean 

difference’' 
(SFA® - 
comparison) 


Statistical 
significance^ 
Effect size® (at a = 0.05) 


Improvement 

index^ 



Ross et al., 1998 (quasi-experimental design)®- 

Durrell Oral Reading subtest General reading Grade 1 4/97 7.01 


-One year of intervention 

6.46 0.55 

(3.52) 


0.16 


ns 


+6 


Smith et al., 1993 (quasi-experimental design)®- 


-One year of intervention 








Durrell Oral Reading subtest General reading Grade 1 4/138 6.74 


4.68 2.06 


0.51 


ns 


-Hi 9 


(4.25) 


(3.83) 








Averages for general reading achievement''* 










Dianda & Flaherty, 1995*° — ^Two to four years of infervenfion 




0.28 


ns 


-Hll 


Madden et al., 1993 — Three years of infervenfion 




0.44 


ns 


-Hi 7 


Ross & Gasey, 1998 — Two years of infervenfion 




0.15 


ns 


-h6 


Ross, Alberg, & McNeils, 1997 — One year of intervention 




0.04 


ns 


-h2 


Ross et al., 1998 — One year of infervenfion 




0.16 


ns 


-h6 


Smith et al., 1993 — One year of intervention 




0.51 


ns 


-Hi 9 


Domain average for general reading achievement across all studies 




0.26 


na 


-HlO 


Averages by years of SFA® implementation 










Results from study with four year of intervention (one study) 




0.23 


ns 


-h9 


Average of results from studies with three years of intervention (two studies) 




0.39 


na 


-Hi 5 


Average of results from studies with two years of intervention (two studies) 




0.21 


ns 


-h8 


Average of results from studies with one year of intervention (three studies) 




0.24 


na 


-h9 



na = not applicable 

nr = not reported 

ns = not statistically significant 

1 . This appendix reports findings considered for fhe effecfiveness rafing and the average improvement indices. Eariier findings from iongitudinai sfudies are not inciuded in these ratings, but are reported in Appendix A4.3. Subgroup find- 
ings from fhe sfudies are nof inciuded in fhese rafings, buf are reporfed in Appendix A4.6 

2. The standard deviation across aii students in each group shows how dispersed the participants’ outcomes are: a smaiier standard deviation on a given measure wouid indicate that participants had more simiiar outcomes. 

3. The cohort is defined by fhe fime prefest is administered. For exampie, kindergarten cohort describes students who compieted pretest measures in kindergarten. 

4. Positive differences and effect sizes favor the intervention group; negative differences and effect sizes favor the comparison group. 

5. For an expianation of the effect size caicuiation, see Technicai Detaiis of WWC-Conducted Computations . (continued) 
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Appendix A3.4 Summary of findings for generai reading achievement domain^ (continued) 



6. Statistical significance is the probability that the difference between groups is a result of chance rather than a real difference between the groups. 

7. The improvement index represents the difference between the percentile rank of the average student in the intervention condition versus the percentile rank of the average student in the comparison condition. The improvement index 
can take on values between -50 and +50, with positive numbers denoting results favorable to the intervention group. 

8. Data are taken from Livingston & Flaherty (1997). 

9. The level of statistical significance was reported by the study authors or, where necessary, calculated by the WWC to correct for clustering within classrooms or schools and for multiple comparisons. For an explanation about the clus- 
tering correction, see the WWC Tutorial on Mismatch . See Technical Details of WWC-Con ductecI Co mputations for the formulas the WWC used to calculate statistical significance. In the case of Dianda & Flaherty 

(1995), Madden et al. (1993), and Smith et al. (1993), a correction for clustering and multiple comparisons was needed so the significance levels may differ from those reported in the original study. In the case of Ross & Casey 
(1998), Ross, Alberg, & McNelis (1997), and Ross et al. (1998), a correction for clustering was needed so the significance levels may differ from those reported in the original study. 

10. Authors reported effect sizes that used comparison group standard deviation in the denominator (Glass's delta). Effect size was computed by subtracting the comparison group mean from the intervention group mean and dividing the 
result by the comparison group standard deviation. 

11. WWC combined means and standard deviations for two SfA® schools (Abbottston and City Springs) and their counterparts. Adjusted posttest means (with pretests standard scores as covariates) were used for effect size calculations. 
Kindergarten and grade 1 cohorts from Abbottston elementary school received four years of intervention. 

12. Authors reported effect sizes adjusted for PPVT pretest scores. 

13. Authors reported pooled standard deviation. 

14. The WWC-computed average effect sizes for each study and for the domain across studies are simple averages rounded to two decimal places. The average improvement indices are calculated from the average effect sizes. 
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Appendix A4.1 



Summary of earlier findings from longitudinal studies for alphabetics domain^ 











Authors’ findings from the study 
















Mean outcome 
(standard deviation^) 




WWC calculations 




Outcome measure 


Construct 


Study 

sample^ 


Sample size 
(schools/ 
students) 


Success 

forAlP Comparison 

group group 


Mean 

difference’' 
(SFA® - 
comparison) 


Statistical 
significance^ 
Effect size® (at a = 0.05) 


Improvement 

index^ 



Borman et al., 2006 (randomized controlled trial)^ — Two years of intervention 



WRMT: Letter ID subtest 


Letter knowledge 


Kindergarten 
and Grade 1 


38/3,353 


451.42 

(14.08) 


449.46 

(11.19) 


1.96 


0.15 


ns 


-f6 


WRMT: Word ID subtest 


Phonics 


Kindergarten 
and Grade 1 


38/3,353 


449.52 

(28.31) 


444.82 

(29.18) 


4.70 


0.16 


ns 


-f6 


WRMT: Word Attack subtest 


Phonics 


Kindergarten 
and Grade 1 


38/3,353 


487.92 

(18.20) 


483.29 

(19.82) 


4.63 


0.24 


Statistically 

significant 


-fIO 



ns = not statistically significant 

1 . This appendix presents earlier longitudinal findings for measures fhat fall in the alphabetics domain. Data that reflected students' exposure to the intervention for the longest period of time were used for intervention rating purposes and 
are presented in Appendix A3. 2. 

2. The standard deviation across all students in each group shows how dispersed the participants’ outcomes are: a smaller standard deviation on a given measure would indicate that participants had more similar outcomes. 

3. The cohort is defined by the time pretest is administered. For example, kindergarten cohort describes students who completed pretest measures in kindergarten. 

4. Positive differences and effect sizes favor the intervention group; negative differences and effect sizes favor the comparison group. 

5. For an explanation of the effect size calculation, see Technical Details of WWC-Conducted Computations . 

6. Statistical significance is the probability that the difference between groups is a result of chance rather than a real difference between the groups. 

7. The improvement index represents the difference between the percentile rank of the average student in the intervention condition and the average student in the comparison condition. The improvement index can take on values be- 

tween -50 and -t-50, with positive numbers denoting results favorable to the intervention group. 

8. The level of statistical significance was reported by the study authors or, where necessary, calculated by the WWC to correct for clustering within classrooms or schools (corrections for multiple comparisons were not applied to findings 

not included in the overall intervention rating). For an explanation about the clustering correction, see the WWC Tutorial on Mismatch . See Technical Details of WWC-Conducted Computations for the formulas the WWC used to calculate 

statistical significance. In the case of Borman et al. (2006), there was no need to adjust for clustering because the data were based on HLM analyses. 
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Appendix A4.2 Summary of earlier findings from longitudinal studies for comprehension domain^ 



Outcome measure 


Construct 


Study 

sample^ 




Authors’ findings from the study 

Mean outcome 
(standard deviation^) 




WWC calculations 




Sample size 
(schools/ 
students) 


Success 

forAlP 

group 


Comparison 

group 


Mean 

difference’' 
(SFA® - 
comparison) 


Effect size® 


Statistical 
significance® 
(at a = 0.05) 


Improvement 

index^ 








Borman et al., 2006 (randomized controlled trial)^ — Two years of intervention 








WRMT: Passage 


Reading 


Kindergarten 


38/3,353 


472.00 


469.87 


2.13 


0.11 


ns 


+4 


Comprehension subtest 


comprehension 


and Grade 1 




(18.29) 


(19.53) 











ns = not statistically significant 

1 . This appendix presents earlier longitudinal findings for measures that fall in comprehension domain. Data that reflected students' exposure to the intervention for the longest period of time were used for intervention rating purposes and 
are presented in Appendix A3. 3. 

2. The standard deviation across all students in each group shows how dispersed the participants’ outcomes are: a smaller standard deviation on a given measure would indicate that participants had more similar outcomes. 

3. The cohort is defined by the time pretest is administered. For example, kindergarten cohort describes students who completed pretest measures in kindergarten. 

4. Positive differences and effect sizes favor the intervention group; negative differences and effect sizes favor the comparison group. 

5. For an explanation of the effect size calculation, see lechnical Details of WWC-Conducted Computations . 

6. Statistical significance is the probability that the difference between groups is a result of chance rather than a real difference between the groups. 

7. The improvement index represents the difference between the percentile rank of the average student in the intervention condition versus the percentile rank of the average student in the comparison condition. The improvement index 
can take on values between -50 and +50, with positive numbers denoting results favorable to the intervention group. 

8. The level of statistical significance was reported by the study authors or, where necessary, calculated by the WWC to correct for clustering within classrooms or schools (corrections for multiple comparisons were not done for findings 
not included in the overall intervention rating). For an explanation about the clustering correction, see the WWC Tutorial on Mismatch . See Technical Details of WWC-Conducted Computations for the formulas the WWC used to calculate 
statistical significance. In the case of Borman et al. (2006), there was no need to adjust for clustering because the findings were based on HLM analyses. 
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Appendix A4.3 Summary of earlier findings from longitudinal studies for general reading achievement domain^ 



Authors’ findings from the study 
Mean outcome 

(standard deviation^) WWC calculations 















Mean 














Sample size 


Success 




difference’' 




Statistical 








Study 


(schools/ 


forAlP 


Comparison 


(Sfifl® - 




significance® 


Improvement 


Outcome measure 


Construct 


sample^ 


students) 


group 


group 


comparison) 


Effect size® 


(at a = 0.05) 


index^ 



Dianda & Flaherty, 1995 (quasi-experimental design)^’ ^ 



Three years of intervention 

3 WLPB subtests and Durrell 
Reading subtest combined 


Generai reading 


English-speaking 
kindergarten 
(1992 cohort) 


6/136 


nr 


nr 


na 


O .4410 


ns 


+17 


Two years of intervention 

3 WLPB subtests and Durrell 
Reading subtest combined 


Generai reading 


English-speaking 
kindergarten 
(1993 cohort) 


6/167 


nr 


nr 


na 


0.87'“ 


Statistically 

significant 


+31 



na = not applicable 

nr = not reported 

ns = not statistically significant 

1 . This appendix presents earlier longitudinal findings for measures fhat fall in general reading domain. Dafa fhaf reflecfed sfudenfs' exposure to the intervention for fhe longest period of time were used for intervenfion rafing purposes and 
are presenfed in Appendix A3.4. 

2. The standard deviation across all students in each group shows how dispersed the participants’ outcomes are: a smaller standard deviation on a given measure would indicate that participants had more similar outcomes. 

3. The cohort is defined by the time pretest is administered. For example, kindergarten cohort describes students who completed pretest measures in kindergarten. 

4. Positive differences and effect sizes favor the intervention group; negative differences and effect sizes favor the comparison group. 

5. For an explanation of the effect size calculation, see Technical Details of WWC-Conducted Computations . 

6. Statistical significance is the probability that the difference between groups is a result of chance rather than a real difference between the groups. 

7. The improvement index represents the difference between the percentile rank of the average student in the intervention condition versus the percentile rank of the average student in the comparison condition. The improvement index 
can take on values between -50 and +50, with positive numbers denoting results favorable to the intervention group. 

8. Data are taken from Livingston & Flaherty (1 997). 

9. The level of statistical significance was reported by the study authors or, where necessary, calculated by the WWC to correct for clustering within classrooms or schools (corrections for multiple comparisons were not done for findings 
not included in the overall intervention rating). For an explanation about the clustering correction, see the WWC Tutorial on Mismatch . See Technical Details of WWC-Conducted Computations for the formulas the WWC used to calculate 
statistical significance. In the case of Dianda & Flaherty (1995), a correction for clustering was needed so the significance levels may differ from those reported in the original study. 

10. Authors reported effect sizes that used comparison group standard deviation in the denominator (Glass's delta). Effect size was computed by subtracting the comparison group mean from the intervention group mean and dividing the 
result by the comparison group standard deviation. 
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Appendix A4.4 Summary of subgroup findings for aiphabetics domain^ 











Authors’ findings from the study 
















Mean outcome 
(standard deviation^) 




WWC calculations 




Outcome measure 


Construct 


Study 

sample^ 


Sample size 
(schools/ 
students) 


Success 

forAlP Comparison 

group group 


Mean 

difference’' 
{SFA® - 
comparison) 


Statistical 
significance^ 
Effect size® (at a = 0.05) 


Improvement 

index^ 







Madden et al., 1993 (quasi-experimental design)®®- 


—Three years of intervention 








WLPB: Letter-Word ID subtest 


Phonics 


Pre-kindergarten/ 
lowest 25% (Cohort 1) 


4/54 


16.37 

(4,88) 


10,86 

(5.72) 


5.51 


1.02 


ns 


-h35 


WLPB: Word Attack subtest 


Phonics 


Pre-kindergarten/ 
lowest 25% (Cohort 1) 


4/54 


4.55 

(4,44) 


0.78 

(2.41) 


3.78 


1.04 


ns 


-h35 


WLPB: Letter-Word ID subtest 


Phonics 


Kindergarten/lowest 
25% (Cohort 2) 


4/38 


21.05 

(4.54) 


14.47 

(6.34) 


6.58 


1,17 


Statistically 

significant 


-h38 


WLPB: Word Attack subtest 


Phonics 


Kindergarten/lowest 
25% (Cohort 2) 


4/38 


5.21 

(3.26) 


1.84 

(2.48) 


3.37 


1,14 


ns 


-h37 


WLPB: Letter-Word ID subtest 


Phonics 


Grade 1 /lowest 25% 
(Cohort 3) 


4/44 


24,14 

(7.06) 


20,73 

(4.87) 


3.41 


0,55 


ns 


-h21 


WLPB: Word Attack subtest 


Phonics 


Grade 1 /lowest 25% 
(Cohort 3) 


4/44 


8.27 

(7,18) 


2.86 

(3.93) 


5.41 


0.92 


ns 


-h32 






Ross & Casey, 1998 (quasi-experimental design)®- 


—Two years of intervention 








WRMT: Word ID subtest 


Phonics 


Kindergarten/ 
lowest 25% 


7/79 


27.10 

(14.25) 


25.10 

(13,40) 


2.00 


0,15 


ns 


+6 


WRMT: Word Attack subtest 


Phonics 


Kindergarten/ 
lowest 25% 


7/79 


10.11 

(6.13) 


7.80 

(8.10) 


2.31 


0.30 


ns 


-Hi 2 






Smith et al., 1993 (quasi-experimental design)®- 


-One year of intervention 








WRMT: Letter ID subtest 


Letter 

Knowledge 


Kindergarten/lowest 
25% (Cohort 1) 


4/38 


nr 


nr 


na 


0,38'® 


ns 


-Hi 5 


WRMT: Word ID subtest 


Phonics 


Kindergarten/lowest 
25% (Cohort 1) 


4/38 


nr 


nr 


na 


2.56'® 


Statistically 

significant 


-h49 


WRMT: Letter ID subtest 


Letter 

Knowledge 


Grade 1 /lowest 25% 
(Cohort 2) 


4/38 


nr 


nr 


na 


-0.07'® 


ns 


-3 


WRMT: Word ID subtest 


Phonics 


Grade 1 /lowest 25% 
(Cohort 2) 


4/38 


28.16 

(10.02) 


18,53 

(12.78) 


9.63 


0.82 


ns 


-h29 



(continued) 
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Appendix A4.4 Summary of subgroup findings for aiphabetics domain^ (continued) 











Authors’ findings from the study 
















Mean outcome 
(standard deviation^) 




WWC calculations 




Outcome measure 


Construct 


Study 

sample^ 


Sample size 
(schools/ 
students) 


Success 

forAlP Comparison 

group group 


Mean 

difference’' 
(SFA® - 
comparison) 


Statistical 
significance^ 
Effect size® (at a = 0.05) 


Improvement 

index^ 



WRMT: Word Attack subtest 


Phonics 


Grade 1 /lowest 25% 


4/38 


9.05 


4.68 


4.37 


0.77 


ns 


+28 






(Cohort 2) 




(5.37) 


(5.76) 











na = not applicable 

nr = not reported 

ns = not statistically significant 

1 . This appendix presents subgroup findings (students in the iowest 25% of their grades) for measures that faii in the aiphabetics domain. Totai group scores were used for rating purposes and are presented in Appendix A3. 2. 

2. The standard deviation across aii students in each group shows how dispersed the participants’ outcomes are: a smaiier standard deviation on a given measure wouid indicate that participants had more simiiar outcomes. 

3. The cohort is defined by the time pretest is administered. For exampie, kindergarten cohort describes students who compieted pretest measures in kindergarten. 

4. Positive differences and effect sizes favor the intervention group; negative differences and effect sizes favor the comparison group. 

5. For an expianation of the effect size caicuiation, see Technicai Detaiis of WWC-Conducted Computations . 

6. Statisticai significance is the probabiiity that the difference between groups is a resuit of chance rather than a reai difference between the groups. 

7. The improvement index represents the difference between the percentiie rank of the average student in the intervention condition versus the percentiie rank of the average student in the comparison condition. The improvement index 
can take on vaiues between -50 and +50, with positive numbers denoting resuits favorabie to the intervention group. 

8. WWC combined means and standard deviations for two Sfi4® schoois (Abbottston and City Springs) and their counterparts. Adjusted posttest means (with pretests standard scores as covariates) were used for effect size caicuiations. 
Kindergarten and grade 1 cohorts from Abbottston eiementary schooi received four years of intervention. 

9. The ievei of statisticai significance was reported by the study authors or, where necessary, caicuiated by the WWC to correct for ciustering within ciassrooms or schoois (corrections for muitipie comparisons were not done for findings 
not inciuded in the overaii intervention rating). For an expianation about the ciustering correction, see the WWC Tutoriai on Mismatch . See Technicai Detaiis of WWC-Conducted Computations for the formuias the WWC used to caicuiate 
statisticai significance, in the case of Ross & Casey (1998), Madden et ai. (1993), and Smith et ai. (1993), a correction for ciustering was needed, so the significance ieveis may differ from those reported in the originai study. 

1 0. Authors reported effect sizes that used comparison group standard deviation in the denominator (Giass's deita). 
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Appendix A4.5 Summary of subgroup findings for comprehension domain^ 











Authors’ findings from the study 
















Mean outcome 
(standard deviation^) 




WWC calculations 




Outcome measure 


Construct 


Study 

sample^ 


Sample size 
(schools/ 
students) 


Success 

forAlP Comparison 

group group 


Mean 

difference’' 
(SFA® - 
comparison) 


Statistical 
significance^ 
Effect size® (at a = 0.05) 


Improvement 

index^ 







Ross & Casey, 1998 (quasi-experimental design)®- 


—Two years of intervention 








WRMT: Passage Com- 
prehension subtest 


Reading 

comprehension 


Kindergarten/ 
lowest 25% 


7/79 


12.29 

(7.79) 


11.20 

(8.20) 


1.09 


0.13 


ns 


+5 






Smith et al., 1993 (quasi-experimental design)®- 


-One year of intervention 








Peabody Picture Vocabulary Test 


Vocabulary 

development 


Kindergarten/lowest 
25% (Cohort 1) 


4/38 


nr 


nr 


na 


0.209 


ns 


+10 


WRMT: Passage Com- 
prehension subtest 


Reading 

comprehension 


Grade 1 /lowest 
25% (Cohort 2) 


4/38 


9.84 

(6.18) 


8.11 

(7.13) 


1.73 


0.25 


ns 


+10 



na = not applicable 

nr = not reported 

ns = not statistically significant 

1 . This appendix presents subgroup findings (students in the iowest 25% of their grades) for measures that faii in the comprehension domain. Totai group scores were used for rating purposes and are presented in Appendix A3. 3. 

2. The standard deviation across aii students in each group shows how dispersed the participants’ outcomes are: a smaiier standard deviation on a given measure wouid indicate that participants had more simiiar outcomes. 

3. The cohort is defined by the time pretest is administered. For exampie, kindergarten cohort describes students who compieted pretest measures in kindergarten. 

4. Positive differences and effect sizes favor the intervention group; negative differences and effect sizes favor the comparison group. 

5. For an expianation of the effect size caicuiation, see Technical Detaiis of WWC-Conducted Computations . 

6. Statistical significance is the probability that the difference between groups is a result of chance rather than a real difference between the groups. 

7. The improvement index represents the difference between the percentile rank of the average student in the intervention condition versus the percentile rank of the average student in the comparison condition. The improvement index 
can take on values between -50 and +50, with positive numbers denoting results favorable to the intervention group. 

8. The level of statistical significance was reported by the study authors or, where necessary, calculated by the WWC to correct for clustering within classrooms or schools (corrections for multiple comparisons were not done for findings 
not included in the overall intervention rating). For an explanation about the clustering correction, see the WWC Tutorial on Mismatch . See Technical Details of WWC-ConductecI Computations for the formulas the WWC used to calculate 
statistical significance. In the case of Ross & Casey (1998) and Smith et al. (1993), a correction for clustering was needed so the significance levels may differ from those reported in the original study. 

9. Authors reported effect sizes that used comparison group standard deviation in the denominator (Glass's delta). Effect size was computed by subtracting the comparison group mean from the intervention group mean and dividing the 
result by the comparison group standard deviation. 
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Appendix A4.6 Summary of subgroup findings for generai reading achievement domain^ 











Authors’ findings from the study 
















Mean outcome 
(standard deviation^) 




WWC calculations 




Outcome measure 


Construct 


Study 

sample^ 


Sample size 
(schools/ 
students) 


Success 

forAlP Comparison 

group group 


Mean 

difference’' 
(SFA® - 
comparison) 


Statistical 
significance^ 
Effect size® (at a = 0.05) 


Improvement 

index^ 







Madden et al., 1993 (quasi-experimental design)® ®- 


—Three years of intervention 








Durrell Oral Reading subtest 


Generai 

reading 


Pre-kindergarten/ 
iowest 25% (Cohort 1) 


4/54 


3.78 

(4.05) 


0.97 

(2.62) 


2.82 


0.81 


ns 


+29 


Durrell Oral Reading subtest 


Generai 

reading 


Kindergarten/iowest 
25% (Cohort 2) 


4/38 


7.79 

(5.25) 


4.21 

(3.83) 


3.58 


0.76 


ns 


+28 


Durrell Oral Reading subtest 


Generai 

reading 


Grade 1 /iowest 
25% (Cohort 3) 


4/44 


14.00 

(6.42) 


7.63 

(4.89) 


6.36 


1.10 


ns 


+36 






Ross & Casey, 1998 (quasi-experimental design)®- 


—Two years of intervention 








Durrell Oral Reading subtest 


Generai 

reading 


Kindergarten/ 
iowest 25% 


7/79 


4.14 

(3.84) 


3.00 

(3.60) 


1.14 


0.31 


ns 


+12 



ns = not statistically significant 



1 . This appendix presents subgroup findings (students in the lowest 25% of their grades) for measures that fall in the general reading domain. Total group scores were used for rating purposes and are presented in Appendix A3.4. 

2. The standard deviation across all students in each group shows how dispersed the participants’ outcomes are: a smaller standard deviation on a given measure would indicate that participants had more similar outcomes. 

3. The cohort is defined by the time pretest is administered. For example, kindergarten cohort describes students who completed pretest measures in kindergarten. 

4. Positive differences and effect sizes favor the intervention group; negative differences and effect sizes favor the comparison group. 

5. For an explanation of the effect size calculation, see Technical Details of WWC-Conducted Computations . 

6. Statistical significance is the probability that the difference between groups is a result of chance rather than a real difference between the groups. 

7. The improvement index represents the difference between the percentile rank of the average student in the intervention condition versus the percentile rank of the average student in the comparison condition. The improvement index 
can take on values between -50 and +50, with positive numbers denoting results favorable to the intervention group. 

8. WWC combined means and standard deviations for two SFA® schools (Abbottston and City Springs) and their counterparts. Adjusted posttest means (with pretests standard scores as covariates) were used for effect size calculations. 
Kindergarten and Grade 1 cohorts from Abbottston elementary school received four years of intervention. 

9. The level of statistical significance was reported by the study authors or, where necessary, calculated by the WWC to correct for clustering within classrooms or schools (corrections for multiple comparisons were not done for findings 
not included in the overall intervention rating). For an explanation about the clustering correction, see the WWC Tutorial on Mismatch . See Technical Details of WWC-Conducted Computations for the formulas the WWC used to calculate 
statistical significance. In the case of Madden et al. (1993) and Ross & Casey (1998), a correction for clustering was needed so the significance levels may differ from those reported in the original study. 
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Appendix A4.7 Summary of alternative groups findings for alphabetics domain^ 











Authors’ findings from the study 
















Mean outcome 
(standard deviation^) 




WWC calculations 




Outcome measure 


Construct 


Study 

sample^ 


Sample size 
(schools/ 
students) 


Success 

forAlP Comparison 

group group 


Mean 

difference’' 
(SFA® - 
comparison) 


Statistical 
significance^ 
Effect size® (at a = 0.05) 


Improvement 

index^ 







Madden et al, 1993 (quasi-experimental design)®®— 


-Dropout version, three years of intervention 






WLPB: Letter-Word ID subtest 


Phonics 


Pre-kindergarten 
(Cohort 1) 


6/282 


18.74 

(5.44) 


15.77 

(6.53) 


2.97 


0.49 


ns 


-h19 


WLPB: Word Attack subtest 


Phonics 


Pre-kindergarten 
(Cohort 1) 


6/282 


5.50 

(4.01) 


2.23 

(3.56) 


3.27 


0.86 


Statistically 

significant 


-h31 


WLPB: Letter-Word ID subtest 


Phonics 


Kindergarten 
(Cohort 2) 


6/292 


25.39 

(6.89) 


21.77 

(6.78) 


3.62 


0.53 


ns 


-h20 


WLPB: Word Attack subtest 


Phonics 


Kindergarten 
(Cohort 2) 


6/292 


9.08 

(6.37) 


4.98 

(4.79) 


4.10 


0.72 


ns 


+27 


WLPB: Letter-Word ID subtest 


Phonics 


Grade 1 
(Cohort 3) 


6/232 


29.14 

(6.24) 


25.78 

(6.37) 


3.36 


0.53 


ns 


+20 


WLPB: Word Attack subtest 


Phonics 


Grade 1 
(Cohort 3) 


6/232 


10.22 

(6.54) 


7.42 

(5.92) 


2.81 


0.45 


ns 


-Hi 7 






Madden et al., 1993 (quasi-experimental design)'®- 


—Dropout version, 


one year of intervention 








WRMT: Combined Letter 
ID and Word ID subtests 


Phonics 


Kindergarten 
(Cohort 1) 


8/256 


18.75 

(5.86) 


17.46 

(6.58) 


1.29 


0.21 


ns 


-h8 


WRMT: Word Attack subtest 


Phonics 


Kindergarten 
(Cohort 1) 


8/256 


5.05 

(4.54) 


3.77 

(4.94) 


1.28 


0.27 


ns 


-hII 


WRMT: Word Attack subtest 


Phonics 


Grade 1 
(Cohort 2) 


8/216 


7.77 

(5.70) 


8.41 

(6.14) 


-0.64 


-0.11 


ns 


-4 


WRMT: Word ID subtest 


Phonics 


Grade 1 
(Cohort 2) 


8/216 


24.95 

(6.25) 


25.41 

(6.41) 


-0.46 


-0.07 


ns 


-3 


WRMT: Word Attack subtest 


Phonics 


Grade 2 
(Cohort 3) 


8/106 


11.52 

(7.32) 


10.11 

(6.07) 


1.41 


0.21 


ns 


-h8 


WRMT: Word ID subtest 


Phonics 


Grade 2 
(Cohort 3) 


8/106 


30.42 

(4.82) 


28.49 

(5.80) 


1.93 


0.36 


ns 


-Hi 4 



ns = not statistically significant 

1 . This appendix presents findings for dropout version of SFA® for measures that fall in alphabetics domain. Data for the full implementation model of SFA® that reflected students’ exposure to the intervention for the longest period of time 
were used for intervention rating purposes and are presented in Appendix A3. 2. (continued) 
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Appendix A4.7 Summary of alternative groups findings for alphabetics domain^ (continued) 



2 . The standard deviation across aii students in each group shows how dispersed the participants’ outcomes are: a smaiier standard deviation on a given measure wouid indicate that participants had more simiiar outcomes. 

3. The cohort is defined by the time pretest is administered. For exampie, kindergarten cohort describes students who compieted pretest measures in kindergarten. 

4. Positive differences and effect sizes favor the intervention group; negative differences and effect sizes favor the comparison group. 

5. For an expianation of the effect size caicuiation, see Technical Detaiis of WWC-Conducted Computations . 

6. Statistical significance is the probability that the difference between groups is a result of chance rather than a real difference between the groups. 

7. The improvement index represents the difference between the percentile rank of the average student in the intervention condition versus the percentile rank of the average student in the comparison condition. The improvement index 
can take on values between -50 and +50, with positive numbers denoting results favorable to the intervention group. 

8. WWC combined means and standard deviations for three Sfi4® schools (Dallas Nicholas, Harriet Tubman, and Dr. Bernard Harris) and their counterparts. Adjusted posttest means (with pretests standard scores as covariates) were used 
for effect size calculations. 

9. The level of statistical significance was reported by the study authors or, where necessary, calculated by the WWC to correct for clustering within classrooms or schools (corrections for multiple comparisons were not applied to findings 
not included in the overall intervention rating). For an explanation about the clustering correction, see the WWC Tutorial on Mismatch . See Technical Details of WWC-Conducted Computations for the formulas the WWC used to calculate 
statistical significance. In the case of Madden et al. (1 993), a correction for clustering was needed so the significance levels may differ from those reported in the original study. 

10. Data are taken from Slavin et al. (1990). 
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Appendix A4.8 Summary of alternative groups findings for comprehension domain^ 











Authors’ findings from the study 
















Mean outcome 
(standard deviation^) 




WWC calculations 




Outcome measure 


Construct 


Study 

sample^ 


Sample size 
(schools/ 
students) 


Success 

forAlP Comparison 

group group 


Mean 

difference’' 
{SFA® - 
comparison) 


Statistical 
significance^ 
Effect size® (at a = 0.05) 


Improvement 

index^ 



Madden et al., 1993 (quasi-experimental design)^’ ^ — Dropout version, one year of intervention 



Durrell Silent Reading subtest 


Reading 

comprehension 


Kindergarten 
(Cohort 1) 


8/256 


3.77 

(3.95) 


3.50 

(4.64) 


0.27 


0.06 


ns 


+2 


Durrell Silent Reading subtest 


Reading 

comprehension 


Grade 1 
(Cohort 2) 


8/216 


8.42 

(6.14) 


7.75 

(5.20) 


0.67 


0.12 


ns 


+5 


Durrell Silent Reading subtest 


Reading 

comprehension 


Grade 2 
(Cohort 3) 


8/106 


15.07 

(5.25) 


11.84 

(5.49) 


3.23 


0.60 


ns 


+22 



ns = not statistically significant 

1 . This appendix presents findings for dropout version of SFA® for measures that fall in comprehension domain. Data for the full implementation model of SFA® that reflected students’ exposure to the intervention for the longest period of 
time were used for intervention rating purposes and are presented in Appendix A3. 3. 

2. The standard deviation across all students in each group shows how dispersed the participants’ outcomes are: a smaller standard deviation on a given measure would indicate that participants had more similar outcomes. 

3. The cohort is defined by the time pretest is administered. For example, kindergarten cohort describes students who completed pretest measures in kindergarten. 

4. Positive differences and effect sizes favor the intervention group; negative differences and effect sizes favor the comparison group. 

5. For an explanation of the effect size calculation, see Technical Details of WWC-Conducted Computations . 

6. Statistical significance is the probability that the difference between groups is a result of chance rather than a real difference between the groups. 

7. The improvement index represents the difference between the percentile rank of the average student in the intervention condition versus the percentile rank of the average student in the comparison condition. The improvement index 
can take on values between -50 and +50, with positive numbers denoting results favorable to the intervention group. 

8. Data are taken from Slavin et al. (1 990). 

9. The level of statistical significance was reported by the study authors or, where necessary, calculated by the WWC to correct for clustering within classrooms or schools 

(corrections for multiple comparisons were not applied to findings not included in the overall intervention rating). For an explanation about the clustering correction, see the WWC Tutorial on Mismatch . See Technical Details of WWC- 
Conducted Computations for the formulas the WWC used to calculate statistical significance. In the case of Madden et al. (1993), a correction for clustering was needed so the significance levels may differ from those reported in the 
original study. 
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Appendix A4.9 Summary of alternative groups findings for general reading achievement domain^ 



Outcome measure 


Construct 


Study 

sample^ 


Sample size 
(schools/ 
students) 


Authors’ findings from the study 

Mean outcome 
(standard deviation^) 




WWC calculations 




Success 

forAlP 

group 


Comparison 

group 


Mean 

difference’' 
(SFA® - 
comparison) 


Effect size® 


Statistical 
significance® 
(at a = 0.05) 


Improvement 

index^ 






Madden et al., 1993 (quasi-experimental design)^’ 


-Dropout version, three years of intervention 






Durrell Oral Reading subtest 


General 


Pre-kindergarten 


6/282 


5.70 


4.11 


1.59 


0,33 


ns 


+13 




reading 


(Cohort 1) 




(4,83) 


(4,83) 










Durrell Oral Reading subtest 


General 


Kindergarten 


6/292 


11,81 


9.00 


2.81 


0.41 


ns 


+16 




reading 


(Cohort 2) 




(7.04) 


(6.50) 










Durrell Oral Reading subtest 


General 


Grade 1 


6/232 


16.60 


13,50 


3,10 


0.44 


ns 


+17 




reading 


(Cohort 3) 




(6.97) 


(7,25) 














Madden et al., 1993 (quasi-experimental designV”- 


—Dropout version, one year of intervention 






OAT Total Reading 


General 


Kindergarten 


8/256 


470.28 


485.13 


-14.85 


-0,14 


ns 


-6 




reading 


(Cohort 1) 




(105,92) 


(107.52) 










Durrell Oral Reading Subtest 


General 


Kindergarten 


8/256 


4.69 


4.89 


-0.20 


-0.05 


ns 


-2 




reading 


(Cohort 1) 




(3.94) 


(4.03) 










CAT Totai Reading 


General 


Grade 1 


8/216 


348,67 


360.67 


-12 


-0.25 


ns 


-10 




reading 


(Cohort 2) 




(47.31) 


(49.99) 










Durrell Oral Reading Subtest 


General 


Grade 1 


8/216 


10.09 


9.34 


0.75 


0,15 


ns 


+6 




reading 


(Cohort 2) 




(5,74) 


(4,33) 










CAT Totai Reading 


General 


Grade 2 


8/106 


387.44 


388.15 


-0.71 


-0.02 


ns 


-1 




reading 


(Cohort 3) 




(36,27) 


(33.75) 










Durrell Oral Reading Subtest 


General 


Grade 2 


8/106 


16.02 


12.13 


3.89 


0.70 


ns 


+26 




reading 


(Cohort 3) 




(6,52) 


(4.22) 











ns = not statistically significant 

1 . This appendix presents findings for fhe dropout version of SFA® for measures fhaf fall in general reading achievement domain. Data for fhe full implemenfafion model of SFA® fhaf reflected students’ exposure to the intervention for fhe 
longest period of fime were used for intervenfion rafing purposes and are presenfed in Appendix A3.4. 

2. The standard deviation across all students in each group shows how dispersed the participants’ outcomes are: a smaller standard deviation on a given measure would indicate that participants had more similar outcomes. 

3. The cohort is defined by the time pretest is administered. For example, kindergarten cohort describes students who completed pretest measures in kindergarten. 

4. Positive differences and effect sizes favor the intervention group; negative differences and effect sizes favor the comparison group. 

5. For an explanation of the effect size calculation, see Technical Details of WWC-Conducted Computations . 

6. Statistical significance is the probability that the difference between groups is a result of chance rather than a real difference between the groups. 

7. The improvement index represents the difference between the percentile rank of the average student in the intervention condition versus the percentile rank of the average student in the comparison condition. The improvement index can 

take on values between -50 and +50, with positive numbers denoting results favorable to the intervention group. (continued) 
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Appendix A4.9 Summary of alternative groups findings for general reading achievement domain^ (continued) 



8. The level of statistical significance was reported by the study authors or, where necessary, calculated by the WWC to correct for clustering within classrooms or schools (corrections for multiple comparisons were not applied to findings 
not Included In the overall Intervention rating). For an explanation about the clustering correction, see the WWC Tutorial on Mismatch . See Technical Details of WWC-Conducted Computations for the formulas the WWC used to calculate 
statistical significance. In the case of Madden et al. (1 993), a correction for clustering was needed so the significance levels may differ from those reported in the original study. 

9. WWC combined means and standard deviations for three SFA® schools (Dallas Nicholas, Harriet Tubman, and Dr. Bernard Harris) and their counterparts. Adjusted posttest means (with pretests standard scores as covariates) were used 
for effect size calculations. 

10. Data are taken from Slavin et al. (1990). 
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Appendix A5.1 Success for AlP rating for the aiphabetics domain 



The WWC rates an intervention’s effects in a given outcome domain as positive, potentiaily positive, mixed, no discernibie effects, potentialiy negative, or negative.'' 

For the outcome domain of aiphabetics, the WWC rated Success for All® as having potentially positive effects. It did not meet the criteria for positive effects because 
oniy one study showed a statistically significant positive effect. The remaining ratings (mixed effects, no discernible effects, potentially negative effects, and negative 
effects) were not considered because Success for All® was assigned the highest applicable rating. 

Rating received 

Potentially positive effects: Evidence of a positive effect with no overriding contrary evidence. 

• Criterion 1: At least one study showing a statistically significant or substantively important positive effect. 

Met. One study that met standards for a strong design showed a statistically significant positive effect. Four studies that met standards with 
reservations showed substantively important positive effects. 

AND 

• Criterion 2: No studies showing a statistically significant or substantively important negative effect and fewer or the same number of studies showing indeterminate 
effects than showing statistically significant or substantively important positive effects. 

Met. No studies showed statistically significant or substantively important negative effects. Two out of the seven studies showed indeterminate 
effects. 

Other ratings considered 

Positive effects: Strong evidence of a positive effect with no overriding contrary evidence. 

• Criterion 1: Two or more studies showing statistically significant positive effects, at least one of which met WWC evidence standards for a strong design. 

Not met. Only one study showed a statistically significant positive effect. 

AND 

• Criterion 2: No studies showing statistically significant or substantively important negative effects. 

Met. No studies showed statistically significant or substantively important negative effects. 

1 . For rating purposes, the WWC considers the statistical significance of individual outcomes and the domain-level effect. The WWC also considers the size of the domain-level effect for ratings of 
potentially positive or potentially negative effects. See the WWC Intervention Rating Scheme for a complete description. 
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Appendix A5.2 Success for AlP rating for the comprehension domain 



The WWC rates an intervention’s effects in a given outcome domain as positive, potentiaily positive, mixed, no discernibie effects, potentialiy negative, or negative.'' 

For the outcome domain of comprehension, the WWC rated Success for All® as having mixed effects. It did not meet the criteria for positive effects because oniy one 
study showed statisticaiiy significant positive effects. In addition, it did not meet the criteria for potentially positive effects because more studies showed indeterminate 
effects than substantiveiy important or statistically significant positive effects. The remaining ratings (no discernibie effects, potentially negative effects, and negative 
effects) were not considered because Success for All® was assigned the highest appiicabie rating. 

Rating received 

Mixed effects: Evidence of inconsistent effects as demonstrated through either of the foiiowing criteria. 

• Criterion 1: At ieast one study showing a statisticaiiy significant or substantively important positive effect, and at least one study showing a statistically significant 
or substantively important negative effect, but no more such studies than the number showing a statistically significant or substantively important positive effect. 

Not met. No studies showed a statistically significant or substantively important negative effect. 

OR 

• Criterion 2: At least one study showing a statistically significant or substantively important effect, and more studies showing an indeterminate effect than showing a 
statistically significant or substantively important effect. 

Met. One study showed a statistically significant positive effect, one study showed a substantively important positive effect, and four studies 
showed indeterminate effects. 

Other ratings considered 

Positive effects: Strong evidence of a positive effect with no overriding contrary evidence. 

• Criterion 1: Two or more studies showing statistically significant positive effects, at least one of which met WWC evidence standards for a strong design. 

Not met. Only one study had a statistically significant positive effect in this domain. 

AND 

• Criterion 2: No studies showing statistically significant or substantively important negative effects. 

Met. No studies showed statistically significant or substantively important negative effects in this domain. 

Potentially positive effects: Evidence of a positive effect with no overriding contrary evidence. 

• Criterion 1; At least one study showing a statistically significant or substantively important positive effect. 

Met. One study had a statistically significant positive effect, and one study had a substantively important positive effect in this domain. 

AND 

• Criterion 2: No studies showing a statistically significant or substantively important negative effect and fewer or the same number of studies showing indeterminate 
effects than showing statistically significant or substantively important positive effects. 

Not met. No studies showed statistically significant or substantively important negative effects in this domain, and more studies showed indeter- 
minate effects (four) than statistically significant (one) or substantively important positive effects (one) in this domain. 

1. For rating purposes, the WWC considers the statistical significance of individual outcomes and the domain-level effect. The WWC also considers the size of the domain-level effect for ratings of 
potentially positive or potentially negative effects. See the WWC Intervention Rating Scheme for a complete description. 
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Appendix A5.3 Success for AlP rating for the generai reading achievement domain 



The WWC rates an intervention’s effects in a given outcome domain as positive, potentiaily positive, mixed, no discernibie effects, potentialiy negative, or negative.'' 

For the outcome domain of general reading achievement, the WWC rated Success for All® as having potentially positive effects. It did not meet the criteria for posi- 
tive effects because only one study showed a statistically significant positive effect. The remaining ratings (mixed effects, no discernible effects, potentially negative 
effects, and negative effects) were not considered because Success for All® was assigned the highest applicable rating. 

Rating received 

Potentially positive effects: Evidence of a positive effect with no overriding contrary evidence. 

• Criterion 1: At least one study showing a statistically significant or substantively important positive effect. 

Met. Three studies showed substantively important positive effects. 

AND 

• Criterion 2: No studies showing a statistically significant or substantively important negative effect and fewer or the same number of studies showing indeterminate 
effects than showing statistically significant or substantively important positive effects. 

Met. No studies showed statistically significant or substantively important negative effects. Three studies showed indeterminate effects and three 
studies showed substantively important positive effects. 

Other ratings considered 

Positive effects: Strong evidence of a positive effect with no overriding contrary evidence. 

• Criterion 1: Two or more studies showing statistically significant positive effects, at least one of which met WWC evidence standards for a strong design. 

Not met. No studies showed a statistically significant positive effect. 

AND 

• Criterion 2: No studies showing statistically significant or substantively important negative effects. 

Met. No studies showed statistically significant or substantively important negative effects. 

1. For rating purposes, the WWC considers the statistical significance of individual outcomes and the domain-level effect. The WWC also considers the size of the domain-level effect for ratings of 
potentially positive or potentially negative effects. See the WWC Intervention Rating Scheme for a complete description. 
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Appendix A6 Extent of evidence by domain 



Outcome domain 


Number of studies 


Schoois 


Sample size 

Students 


Extent of evidence^ 


Alphabetics 


7 


67 


3,103 


Moderate to large 


Fluency 


0 


0 


0 


na 


Comprehension 


6 


65 


2,565 


Moderate to large 


General reading achievement 


6 


31 


1,767 


Moderate to large 



na = not applicable/not studied 

1 . A rating of “moderate to large” requires at least two studies and two schools across studies in one domain and a total sample size across studies of at least 350 students or 14 classrooms. 
Otherwise, the rating is “small.” 
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